Persistent homology in graph power filtrations

The persistence of homological features in simplicial complex representations of big datasets in Rn resulting from Vietoris–Rips or Čech filtrations is commonly used to probe the topological structure of such datasets. In this paper, the notion of homological persistence in simplicial complexes obtained from power filtrations of graphs is introduced. Specifically, the rth complex, r ≥ 1, in such a power filtration is the clique complex of the rth power Gr of a simple graph G. Because the graph distance in G is the relevant proximity parameter, unlike a Euclidean filtration of a dataset where regional scale differences can be an issue, persistence in power filtrations provides a scale-free insight into the topology of G. It is shown that for a power filtration of G, the girth of G defines an r range over which the homology of the complexes in the filtration are guaranteed to persist in all dimensions. The role of chordal graphs as trivial homology delimiters in power filtrations is also discussed and the related notions of ‘persistent triviality’, ‘transient noise’ and ‘persistent periodicity’ in power filtrations are introduced.


Introduction
Topological data analysis is concerned with determining the topological structure of data [1]. One such approach to analysing large sets of discrete data points in R n is to convert the dataset into a global topological object by replacing the dataset with a simplicial complex indexed by a Euclidean distance proximity parameter that defines the simplices of the complex. The powerful mathematical machinery of algebraic topology, e.g. [2], can then be applied to the complex in order to understand the fundamental topological properties of the dataset in terms of the topologically invariant homology groups associated with the data's simplicial complex representation. When combined with persistence theory and barcode theory, e.g. [3], these homology groups can often provide valuable insights about the underlying phenomena represented by the data. These methods have been used to study such diverse areas as sensor network coverage [4], random graphs and complex networks [5], shape analysis [6], brain topology [7], the evolution of viruses [8]  history [9]. The intrinsic mathematical appeal of these methods has also prompted mathematicians to further formalize their description and extend their utility (e.g. [10,11]).
Traditional persistence theory typically uses Vietoris-Rips (Rips for short) orČech Euclidean filtrations of R n datasets to generate a series of Rips orČech simplicial complexes, each associated with a different value of the varying Euclidean distance proximity parameter. This paper introduces a variant to this persistence approach to topological analysis that is based upon the power filtration of a simple graph G. It uses the graph distance r ≥ 1 in G as the associated proximity parameter and generates the filtration by increasing the value of r. The rth complex in the filtration is the clique complex of the rth power G r of G (such a filtration is called an r filtration and each complex in the filtration is called an r complex: note that an r complex is a Rips complex whose simplices are subsets of the vertices of G that are a distance of at most r from each other in the discrete metric space defined by G). Although persistent homology in a Euclidean filtration yields information about a dataset's topology in R n from the perspective of a one-parameter family of complexes whose vertices are the data points and whose simplices are defined by varying the Euclidean distance proximity parameter in R n , persistent homology in a power filtration of a graph provides topological insights about the graph from complexes whose vertices are those of the graph and whose simplices are defined using graph distance within the graph as the variable proximity parameter. Thus, unlike Euclidean filtrations where regional scale differences can be an issue, power filtrations can provide scale-free insights into graph topology. In both Euclidean and r filtrations, persistent homology features are considered to reflect important topological properties. Homology features which do not persist are generally regarded as relatively unimportant 'topological noise'.
Because of the increasing interest in topological data analysis, much recent attention has been devoted to the development of software packages which can perform persistent homology computations with relative efficiency, e.g. [12][13][14] (it is interesting to note that a quantum mechanical algorithm has recently been developed that will exponentially speed up these computations-but alas-the quantum computers required to execute the algorithm do not yet exist [15]). It is shown below that in certain cases the girth of G can be used to reduce or eliminate such computations for r filtrations of G by defining not only a range r of r index values for which all of the homology features in all dimensions of the associated r complexes remain unchanged, but also a lower bound for the persistence lifetimes of every such feature. Although quantifying the girth of G can also place demands on computational resources, its evaluation can be cost-effective when compared with the resources required to compute all of the homology groups for each r complex in the r range (as noted below-the easily computed Randić index can be used to bound girth).
It will also be shown that chordal graphs-which have homologically trivial clique complexes-serve several important functions in r filtrations. As the power filtration of a connected graph G stabilizes at some power r s (G) (or r s when the graph referenced is clear) as a complete graph (which is a chordal graph), the associated filtration of r complexes stabilizes at the stabilization distance r s (G) as a homologically trivial simplex. In addition, if-in a power filtration of a connected graph G -G r c (G) ≡ G r c , 1 < r c < r s , is a chordal graph, then-until the stabilization distance r s is reached-the complexes associated with (G r c ) 2j+1 , j ≥ 1, are all homologically trivial. This is a persistent periodic homology feature peculiar to r filtrations that can also induce persistent periodicity in transient noise, i.e. 'topological noise' having the smallest possible lifetime in an r filtration. It can also be the case that all of the complexes associated with (G r c ) j , j ≥ 1, are homologically trivial (when G r c contains no sunflower subgraph-see the next section). If this occurs the filtration exhibits persistent trivial homology when r ≥ r c . For the general extreme case where the complex associated with G r is homologically trivial for 1 ≤ r ≤ r S , then the filtration is said to exhibit persistent triviality.
In order to make this paper relatively self-contained, relevant definitions and terminology are summarized in the next section. The theory of persistent homology in r filtrations (i.e. r persistence) is introduced in §3. Required preliminary lemmas are stated in §4 and the main results are developed in §5. Illustrative examples are presented in §6 and closing remarks comprise the paper's final section.

Definitions and terminology
of vertex u ∈ V(G) is the number of edges incident to u and the sum {u,v}∈E(G) [deg(u)deg(v)] −1/2 is the Randić index R(G) of G. A graph is regular if each of its vertices has the same degree. Graphs G and F are isomorphic graphs A u − v walk is an alternating sequence of vertices and edges beginning with u and ending with v such that every edge joins the vertices immediately preceding and following it. A u − v path is a u − v walk in which no vertex is repeated and the number of edges it contains is its length. In this case, u is said to be connected to v. G is connected if its order is one or if every two vertices in G are connected. The graph distance d (u, v) between u, v ∈ V(G) is the minimum length of all u − v paths and the diameter diam(G) of G is max u,v∈V(G) d (u, v). Clearly, r s (G) = diam(G). A u − v path for which u = v and which contains at least three edges is a cycle. The graph C n is the cycle graph on n vertices. The length of a cycle is the number of edges contained within it and the shortest cycle of G is the girth g(G) of G. A chord of a cycle is an edge between non-consecutive vertices in the cycle.
The rth power G r , r ≥ 1, of G is the graph with vertex set V(G r ) = V(G) and for which {u, v} ∈ E(G r ) if, and only if, the distance between u and v in G is at most r. A graph is a chordal graph if every cycle of length of at least four has a chord. A graph K n is a complete graph on n vertices when n = |V(G)| = 1 or every two of its vertices are adjacent.
An abstract simplicial complex S on a finite set A is a family of sets {σ ∈ S : σ ⊆ A} such that: The clique complex C (G) of a graph G is the abstract simplicial complex whose faces are the cliques of G. Associated with each complex S is a chain complex of abelian groups Γ k (S) and homomorphisms ∂ k+1 : Γ k+1 (S) → Γ k (S), k ≥ 0, where ker ∂ k are the k-cycles in S and im∂ k+1 are the k-boundaries in S. If ρ k (S) is the number of k-simplices in S, then Γ k (S) is isomorphic to (denoted '≈') the direct sum '⊕' of ρ k (S) copies of the additive group of integers Z. The kth homology group of S is the quotient group H k (S) = ker ∂ k /im∂ k+1 which captures equivalence classes of non-bounding k-cycles by factoring out boundary cycles. If S is comprised of m connected components, then H 0 (S) ≈ Z ⊕ Z ⊕ · · · ⊕ Z (m copies of Z) and if S is of dimension δ, then it has δ + 1 homology groups. S is homologically trivial when H 0 (S) ≈ Z and H k (S) ≈ 0, k ≥ 1.
If S and T are abstract simplicial complexes, then a simplicial map ϕ : S → T is a map ϕ :

Persistent homology theory for power filtrations of graphs
When applied to a complex, homology detects the presence of connected components, holes and voids in the complex. Rather than use the homology of a single complex as a description of the topology of a graph, it can be preferable to describe a graph's topology by identifying topological features detected by the homology that persist in a filtration of the graph. As already mentioned, persistent homological features can indicate potentially important topological properties of a graph, whereas features which do not persist can generally be regarded as relatively unimportant 'topological noise'.
Useful insights into the topology of a simple graph G can be obtained from an understanding of the homology of clique complexes derived from the rth powers G r of G (e.g. if G is connected and diam(G) is known, r can provide a measure of how close G is to being a homologically trivial simplex). To this end, let r vary over an appropriate distance range 1 ≤ r ≤ p within G to produce the r filtration and induced homology sequence given by the diagrams .
where ' i j →' and ' ϕ * j →' denote simplicial inclusion maps and group homomorphisms, respectively. As already noted in §1, if G is connected, then the filtration stabilizes with the homologically trivial simplex C (G r s ). In this case, the homology sequence for k > 0 can be extended to include the groups where '•' denotes 'composition of homomorphisms'. The r persistence lifetime of c is then λ(c) = l − j and is represented by the lifetime interval [j, l]. A non-zero homology class [c] that is 'born' at C (G) and 'dies' at C (G r s ) has a lifetime λ(c) = r s (G) − 1 and is said to be an r s (G) survivor. Transient noise is a class The lifetime of transient noise is λ(c) = 1 and is represented by the lifetime interval [j, j + 1]. Note that transient noise corresponds to the smallest possible non-zero lifetime that can exist in an r filtration. A visualization of a complete r persistence analysis of G is given by the complete r persistence barcode for G which is a graphical representation of the multiset of all lifetime intervals for finite k. The binary r persistence barcode for G is a binary string β(G) of length r s (G) where the rth entry corresponds to C (G r ) and is 0(1) if H k (C (G r )) ≈ 0, k > 0 ( ≈ 0 for some k > 0). For example, if β(G) is a string of r S zeros, then G exhibits persistent triviality which indicates that the topology of G is effectively homologically featureless (and possibly uninteresting) for all powers of G. However, if 010 is a substring of β(G), then for the r corresponding to the position of the 1 in β(G), the complex C (G r ) generates transient noise in at least one dimension k > 0. This indicates that relationships exist in subsets of G's vertices that are manifested as short-lived (but possibly interesting) topological features in C (G r ).

Preliminary lemmas
Results needed to prove or discuss the main results of this paper are presented in this section for the reader's convenience. The following lemmas have been established elsewhere and are stated here without proof.

Main results
In what follows, it is assumed that G is a simple graph. However, before continuing-for completenessthe above observations concerning r filtration stabilization are generalized as the following theorem. Since the consequence of the theorem is obvious, the theorem is stated without proof.

Persistent homology in r filtrations
The last theorem suggests the following persistent homology theorem for r filtrations:

Theorem 5.2. If G has m ≥ 1 connected components, then the m non-zero homology classes of
Proof. This is an obvious consequence of theorem 5.1 and the fact that since r is a graph distance in G and the number of connected components remains invariant in G r , r ≥ 1.
Hereafter, for the sake of simplicity and without loss of generality, it will be assumed that G is a connected graph.
The next result shows that the girth of G defines a power index range over which non-zero homology classes of C (G) are guaranteed to persist.

Theorem 5.3. For some counting number l
Proof. Lemma 4.1 implies that C (G r−1 ) and C (G r ) are homotopy equivalent complexes for 2 ≤ r ≤ l. Consequently, their homology groups are isomorphic for k > 0 and the sequence of homology groups and induced homomorphisms exists, where each i * j is an isomorphism. If 0 = [c] ∈ H k (C (G)), then-since each i * j in the sequence is an isomorphism-it must be the case that Now let q ∈ {l, l + 1, l + 2, . . . , r s − 1} be the smallest integer such that ϕ * q It is clear that theorem 5.3 only applies when g(G) ≥ 10 and is most useful for graphs with extremely large girths. Lemma 4.2 provides a relatively straightforward and quick method for determining if g(G) is large enough to apply theorem 5.3.
The importance of chordal graphs as delimiters for persistent trivial homology in r filtrations is expressed in the next theorem. Chordal graphs have been studied extensively over the last several decades and efficient algorithms have been developed which can recognize when a graph is chordal (e.g. [22]). Proof. Since G r c is a chordal graph with no sunflower, then (G r c ) j is chordal for all j ≥ 1 (lemma 4.4). Consequently, C ((G r c ) j ), j ≥ 1, are homologically trivial complexes so that H k (C ((G r c ) j )) ≈ 0, k > 0, for j ≥ 1 (lemma 4.3). It must therefore be the case that for k > 0, all non-zero homology classes [c] are 'born' and 'die' in C (G r ) with r < r c . Since r = 1 is the smallest power index for which [c] = 0 can be 'born' and r = r c is the largest power index for which [c] = 0 can 'die', then the persistence lifetime of c can be no greater than r c − 1 when k > 0.

Persistent periodicity and transient noise in r filtrations
While it is the case that all chordal graphs are not closed under powers [23] (a situation for which closure occurs (lemma 4.4) has been applied in theorem 5.4), it is nonetheless true that every odd power of a chordal graph G is also chordal (lemma 4.5). Thus, the presence of a chordal graph at r c (G) < r s (G) in an r filtration-at a minimum-guarantees that at least a periodic homological triviality persists in the associated complexes in the filtration. In what follows, it will be assumed that all power indices do not exceed the stabilization distance r s (G).
Proof. As G r c = (G r c ) 1 is a chordal graph, then so are the graphs (G r c ) 2j+1 , j ≥ 1 (lemma 4.5). The fact that the homology of the clique complexes of these graphs is trivial follows from lemma 4.3.
Thus, every entry corresponding to C ((G r c ) 2j+1 ), j ≥ 0, in the associated β(G) binary barcode is 0, i.e. these 0 entries repeat in β(G) with a period of 2.
As indicated by lemma 4.6, it can be the case that G 2 is not chordal even though G is (however, if G 2 is chordal, then all powers 1 ≤ r < r s of G are necessarily chordal [20] and G exhibits persistent triviality). This situation can produce transient 'topological noise' that is a feature peculiar to r filtrations and is described by the following corollary. Because this result is an obvious consequence of the last theorem, it is stated without proof.

Corollary 5.6. Assume that G is a chordal graph and G 2 is not a chordal graph. Then every non-zero homology class
A special case of this-persistent periodic transient noise-arises when every C (G 2j ), j ≥ 1, has at least one non-zero homology class in some non-zero dimension. In this case, the entries in the binary barcode β(G) corresponding to C (G 2j ), j ≥ 1, are 1 (i.e. 1 repeats with period 2) and those corresponding to C (G 2j+1 ), j ≥ 1, are 0 (i.e. 0 repeats with period 2). Thus, beginning with the first location which corresponds to C (G) the pattern 01 is repeated for the remainder of the string (of course, the r s th and final entry in β(G) is 0). Note that even if H k (C (G 2 )) ≈ 0, k > 0, the fact that G 2 is not chordal signals a structural change in the relationships between subsets of vertices of G that may provide useful insights into the properties of G (e.g. see the sunflower graph example in the next section).
It is interesting to note (lemmas 4.4 and 4.6) the significance of sunflower subgraphs S n in determining the persistence and periodicity of homological triviality in r filtrations. Obviously, if S n G and G is chordal, then G is closed under powers, the clique complexes of all powers of G (through its stabilization distance) are homologically trivial, and G exhibits persistent triviality. Perhaps not so obvious is the case where a sunflower subgraph in a chordal graph G prevents G 2 from being chordal. However, with a little reflection, it is easily seen that consecutive pairs of the independent vertices {u 1 , u 2 , . . . , u n } for each S n ⊆ G are separated by a graph distance of two and are therefore adjacent in G 2 . This forms chordless cycles C n in G 2 , thereby rendering it non-chordal and producing non-zero homology classes in H 1 (C (G 2 )). Clearly, when S n , n ≥ 4, is suspended in G, then G cannot be chordal because the suspension itself induces chordless cycles in G. Consequently, C (G) exhibits a non-trivial homology because these cycles generate non-zero homology classes in H 1 (C (G)).
This example also illustrates theorem 5.4. In particular, as H 1 (C (C 4 11 )) ≈ Z, then C 4 11 is not chordal (via contrapositive of lemma 4.3 because C 4 11 is connected). It follows that r c (C 11 ) = r s (C 11 ) = 5 and that   Because there are no non-zero homology classes in H 1 (C (S 2 4 )) and therefore no transient noise, these results are consistent with corollary 5.6. Also-as per the discussion in the last section-the fact that S 4 is chordal and S 2 4 is not chordal could indicate potentially useful insights into the properties of S 4 (e.g. although vertices 1,6,8 and 3 are unrelated in S 4 , the vertices in the vertex pairs 13,38,68 and 16 are close enough in S 4 to be related by a single unit change in r).
Both Rips filtrations andČech filtrations are used in the analysis of large datasets in R n . Recall that thě Cech complex associated with a finite collection Σ of data points in R n is the abstract simplicial complex whose k-simplices are those subsets of k + 1 data points in Σ whose closed ball neighbourhoods of radius β have a common point of intersection, whereas points in the simplices in the associated Rips complex are pairwise within a distance ε. AČech (Rips) filtration of Σ is performed by varying the value of the ball radius β (the value of ε) and constructing aČech (Rips) complex for each β (ε) value of interest.
In what follows, aČech filtration of a dataset Σ of 30 points in R n is used to construct a series of graphs forČech complex representations of Σ. For the purpose of comparison, an r filtration of an associated relative neighbourhood graph R representation of Σ is used to highlight several advantages that can be obtained by using power filtrations instead of Euclidean filtrations in the analysis of large datasets.
The graphs of fourČech complexes obtained from aČech filtration of the 30 point dataset Σ are shown in figure 2 for ball radius values β = 0.22, 0.41, 0.55 and 0.71. As can be seen in the figure, the disconnected components do not persist: the graph of the initial complexČ 0.22 at β = 0.22 is highly disconnected and the distribution of connected components of the graph ofČ 0.22 in the figure is 'circular'. As β increases, the number of connected components in the graphs of the associated complexes decreases until the graph of the complexČ 0.71 is completely connected around a 'large one-dimensional hole' γ at β = 0.71. Although the homology group sequence H 0 (Č 0. 22  the rank of the zero-dimensional homology groups decreases from 14 to 1 provides a good description of the Euclidean compactness of Σ in R n (the presence of γ would be detected by H 1 (Č 0.71 )).
A relative neighbourhood graph [24] on a dataset X has X as its vertex set with an edge between Here, B(x, ρ) is the open ball in R n of radius ρ and centre x, and d ij = d(x i , x j ) is the Euclidean distance between x i and x j . The importance of a relative neighbourhood graph of X is that it provides a single graph representation of X that serves as a 'primal sketch' of its topological features.
As a final example, consider the relative neighbourhood graph Γ representation of another dataset Ψ = Σ consisting of 40 data points in R n shown in figure 4a. Observe that the single graph Γ immediately discerns three cycles (at different scales), whereas they only gradually emerge from aČech filtration of Ψ as β increases. The power filtration of Γ provides essentially the same information about Ψ as thě Cech filtration but-unlike theČech filtration-it has the advantage that (in this case) it provides this information with a single graphical representation of Ψ so that Ψ is accessed only once to compute Γ . It is easy to see from theorem 5.2 that H 0 (C (Γ r )) ≈ Z, 1 ≤ r ≤ r s . However, observe that as g(Γ ) = 5 < 10, theorem 5.3 cannot be applied to the filtration of Γ .

Closing remarks
This paper has introduced the notion of using homological persistence in simplicial complexes obtained from power filtrations of simple graphs as an approach to probing their topological structure. This method is especially useful when applied to graphs with girths greater than 10. In these cases, the homologies of complexes in the filtration remain isomorphic and persist over a range of power indices that increases with increasing girth. An interesting feature of power filtrations of graphs is the fact that the emergence of a pre-stabilization chordal graph in a filtration signals the presence of trivial or periodic homology which persists for the remainder of the filtration.
Using as examples datasets in R n , it is also suggested that: (i) for those cases where a data-derived graph is designed to elicit information of interest and the graph can be computed efficiently, power filtration provides an alternative approach to persistent homology analysis while providing information comparable to that obtained fromČech or Rips filtrations and (ii) because it requires only an initial single graph representation of a dataset, power filtrations tend to reduce 'topological noise' in many practical applications.
In closing, it is important to note that the results of this paper can be applied directly to such cases as physical network and social network analysis where the data are naturally represented as a simple graph. In addition, in manifold learning a single simple graph is constructed and used to produce a data embedding into a lower dimensional space. The results of this paper provide a mechanism for understanding the topology of this graph along with the potential for producing additional information relevant to the associated embedding [25]. theta <seq(0,2*pi,length=N+1)[-1] z <cbind(cos(theta)+rnorm(N,0,.1),sin(theta)+rnorm(N,0,.1)) The following R code was used to append dataset Σ with 10 additional points to produce the example 40 point dataset Ψ in §6: set.seed(89820) r <-.1 n <-10 thetar <seq(0,2*pi,length=n+1)[-1] x <cbind(r*cos(thetar)+rnorm(n,0,.1*r),r*(sin(thetar)+rnorm(n,0,.1*r)) y <rbind(z,x)