Estimating network dimension when the spectrum struggles

What is the dimension of a network? Here, we view it as the smallest dimension of Euclidean space into which nodes can be embedded so that pairwise distances accurately reflect the connectivity structure. We show that a recently proposed and extremely efficient algorithm for data clouds, based on computing first- and second-nearest neighbour distances, can be used as the basis of an approach for estimating the dimension of a network with weighted edges. We also show how the algorithm can be extended to unweighted networks when combined with spectral embedding. We illustrate the advantages of this technique over the widely used approach of characterizing dimension by visually searching for a suitable gap in the spectrum of the Laplacian.


Motivation
Given a network, it is often desirable to embed the nodes into Euclidean space so that distances between nodes reflect the connection strengths.Such an embedding may form a preprocessing step for visualization, clustering, or semi-supervised learning of labels [3].Spectral techniques, based on eigenvectors of a suitable Laplacian, are commonly used for the projection.In this case, choosing the dimension of the embedding, that is, the the number of eigenvectors used, is an important task.However, this task is difficult to formalize, with a widely accepted rule of thumb being "look for a gap in the spectrum" [28].
In this work we investigate the use of the recently proposed two nearest neighbour, or twoNN, algorithm in [13] as a means to inform the choice of dimension.The algorithm is designed to estimate the dimension of a cloud of data points, assuming the points are samples from a continuous manifold.It is extremely efficient, compared, for example, with box counting techniques, requiring only the pairwise distances between all first and second-nearest neighbours.In the case of a weighted (undirected) network for which edge weights can be used to define the desired distances, the algorithm is directly applicable.We find that it performs well on examples where a ground truth is available and where information from the Laplacian spectrum is at best ambiguous.In the case of unweighted networks, where edge weights are either present or absent, the algorithm can no longer be applied directly.However, we show that on examples where a ground truth is available, useful estimates of the dimension can be recovered by spectrally embedding into successively higher dimensional Euclidean space and applying twoNN at each stage.In cases where the ground truth has been contaminated by noise we find that, unlike the Laplacian spectrum, twoNN is robust and informative.In a final experiment on real data the twoNN estimate remains consistent when applied to a K nearest neighbour binarization of the underlying weighted network.

Set-up
Suppose we are given a non-negative N × N symmetric dissimilarity matrix M .We assume that there are N distinct underlying objects, x i , within some universal set S, and that the element M ij for i ̸ = j measures the dissimilarity between objects x i and x j ; so a larger M ij indicates that x i and x j are more dissimilar.Ideally elements of M should also satisfy a triangle relationship; that is, M ij ≤ M ik + M kj for all distinct i, j, k.This would hold by construction if elements of M were derived from a metric space, (S, δ), containing the underlying N objects; and M ij = δ(x i , x j ).However, a triangle relationship is not necessary in what follows.
In some contexts, M will be implied via a weighted graph or a similarity matrix: we have a a symmetric W ∈ R N ×N such that each W ij ≥ 0 denotes the weight or strength of the connection between the distinct objects x i and x j , with W ii = 0.In this case, a larger W ij indicates that x i and x j are more similar.
To convert between a dissimilarity matrix M and a similarity matrix W , we may use, for example for some σ [11].
Given a dissimilarity matrix, M , with no notion of S, we are often tasked with embedding M into some Euclidean space: that is, we wish to define a set of real location vectors, {y [i] } N i=1 in R k , such that the Euclidean distances ∥y [i] − y [j] ∥ are monotonic functions of the M ij .Ideally, the embedding dimension, k, should reflect the inherent dimension d of the data.There are many ways to embed such matrices, or equivalently the weighted graphs, and these usually involve some spectral analysis of a matrix related to M or W .
For unweighted graphs, where W is binary and symmetric, there has been much interest within combinatorics.The dimension of a graph is the smallest value of d for which its vertices may be embedded in R d such that the distances between the endpoints of each and every edge are equal to unity.Furthermore, in answer to a question of [12], any graph with less than d + 2 2 edges has dimension at most d; while the dimension of a graph with maximum degree d is at most d [14].
In this work we are concerned with the more practical question of how to compute a representative value for d when it is likely that the network information is noisy or incomplete.

Spectral Embedding
Given the similarity matrix W , it is common to embed the graph into a kdimensional Euclidean space using a spectral method.The embedding may be regarded as an N × k matrix G = g [1] , g [2] , . . ., g [k] , with orthonormal columns g [j] ∈ R N .The ith object x i is given coordinates according to the ith row of G; that is, As described in [2], it is natural to specify G via arg min g [1] ,g [2] ,...,g [k]   m i=1,j=1 with G T G = I.This leads to a solution where the columns g [i] are given by eigenvectors corresponding to the k lowest nonzero eigenvalues of the Laplacian matrix L = D − W . Here, D ∈ R N ×N is the diagonal matrix whose diagonal contains the row/column sums of W .We note that L is self-adjoint and has a zero eigenvalue with geometric multiplicity given by the number of connected components of the graph [28].Other related methods are available [21].This type of spectral embedding approach is closely related to principal component analysis and multi-dimensional scaling [5,16], and similar techniques are used for clustering [9,28], ranking [10,17,26], subgraph detection [6] and graph visualization [18].How should we best choose the embedding dimension, k? Typically one might examine the spectrum of L, perhaps on a log scale, and search for an upward step, or gap, discarding the eigenvalues/eigenvectors to the right [8,28].In many circumstances, though, no clear step may be apparent.
In essence, for a given choice of dimension k spectral information from the Laplacian provides an optimal embedding, in a least squares sense.But that spectral information does not always tell us how to choose the best k.
In Figure 1 we show the Laplacian spectrum for two examples where N = 3, 000.Here we sampled vectors x i ∈ R 25 with components chosen independently and uniformly at random in [0, 1], and set W ij = 1/∥x i − x j ∥, for i ̸ = j.So, by design, we hope that the graph (or the matrix) will be embeddable in R 25 .Yet this is not reflected by the spectrum shown in Figure 1: an "eyeball" search for a gap in the spectrum would not highlight a dimension of 25.

TwoNN and Weighted Graphs
In this work we investigate the approach of choosing the embedding dimension with the twoNN method, which was developed in [13] to estimate what those authors refer to as the intrinsic (fractal) dimension of sparse point clouds in very high dimensional Euclidean spaces; this may be viewed as the dimension of an underlying continuous manifold from which the data points are sampled.We refer to [4] for further discussion of intrinsic dimension and connections to learning theory and topological data analysis.With twoNN, for all objects in the point cloud one finds the distances to the nearest neighbour and to the second nearest neighbour.The ratio, µ > 1, of the latter distance to the former, calculated separately for all points, produces an empirical cumulative distribution, say F emp (µ).This can be compared with the expression F (µ) = (1 − µ −d ) that the authors derived under the assumption of local uniform sampling density from a space of dimension d.Comparing the empirical and exact cumulative distributions allows us to estimate the dimension d.If we let µ i denote the second-to-first nearest neighbour distance ratio for node i, and let σ denote a permutation vector such that µ σ(i) are in ascending order, then the empirical cumulative distribution has ))] should lie on a line of slope d.So the points are estimates for the dimension d.
When we are presented with data in the form of a dissimilarity matrix M , we may infer pairwise distances directly: the two nearest neighbour distances of an object simply correspond to the two smallest elements of the corresponding row of M .In Figure 2 we illustrate the method on the two data sets used in Figure 1.Here, we set As discussed in [13], for small i we expect sampling errors to dominate, with too little information in play; whereas if i is too large then we would see the lack of very large-scale distance differences affect the estimate (since the distances are globally bounded).So a sweetspot is desirable, where the estimate for d is relatively static.In Figure 2 we highlight in red the portion of the curve where N/4 ≤ i ≤ 3N/4, which leads to stable estimates, and throughout this work we use the mean of d i over this range as our overall estimate of d.For Figure 2 we obtain estimates of 18.6 and 18.4.This approach appears much more definitive than a visual search for a spectral gap.We emphasize that in Figures 1 and 2, the N = 3, 000 data points x i were generated within the cube [0, 1] 25 .Yet 2 25 = 33, 554, 432, so this set of points will hardly get close to any of the extremities.Hence, this example is challenging-high dimensional spaces are very lonely places.Moreover, 25 should be considered as a hard upper bound for any estimate of d, with the distance/dissimilarity matrix likely to be more consistent with a smaller dimension.
From a theoretical perspective, the following conditions are known to be sufficient for a discrete graph Laplacian to converge spectrally to the Laplace-Beltrami operator over an underlying sampling manifold, e.g.[27]: where η is the radially symmetric kernel function such that W ij =: η(x i − x j ) for all i, j ∈ {0, . . ., N }, and where η η η is the radial profile, or the shape, of η, i.e., η(x) = η η η(||x||) for all x ∈ R d .However, in a typical practical setting, such conditions cannot be validated, in which case there is no guarantee that the spectral embedding is consistent.We note that the estimate of d ≈ 18.5 from Figure 2 provided by twoNN appears acceptable, even though the conditions in (3) for W are not satisfied.Moreover, we show in Figure 3 an experiment where N is reduced to 300.Here, twoNN still produces reasonable estimates of 16.5 and 16.8.We also note that the algorithm is insensitive to smooth rescaling of the distance measure, in the sense that if r 2 and r 1 are two small distances then for any differentiable function f such that f (0) = 0 and f ′ (0) ̸ = 0. We therefore suggest that direct application of the twoNN method from [13] on a weighted network provides reliable information for choosing a spectral embedding dimension k, and at the very least may be regarded as a back-up procedure or sanity check for the widely-adopted approach of visually inspecting the spectrum of the Laplacian.In the remainder of the manuscript we focus on the more challenging case of an unweighted network, where the algorithm is not directly applicable.

TwoNN and Unweighted Graphs
With an unweighted graph, it remains natural to seek a node embedding such that nearby nodes are connected and distant nodes are unconnected.For example, we may postulate that the connectivity structure in the graph arose from some sort of (unobserved) binarization mechanism, such as K nearest neighbour or radius-based thresholding (geometric).For an unweighted graph, the notion of first and second nearest neighbour is not immediately applicable.Hence, we advocate an indirect approach where spectral embedding is used as an intermediate step.Here, we look for the largest s such that after spectrally embedding into dimension s the twoNN algorithm also delivers an estimate close to s for the dimension.Intuitively, if we spectrally embed into a dimension that is unnecessarily small, then twoNN will reproduce this dimension, whereas if we spectrally embed into a dimension that is unnecessarily large, then twoNN will find the appropriate, smaller dimension.
To be concrete, in Algorithm 1 we outline the steps involved in spectral embedding followed by the application of twoNN.Our overall approach is then to apply Algorithm 1 for s = 2, 3, . . .and observe how the twoNN estimate d ⋆ compares with the embedding dimension s, stopping when d ⋆ plateaus as a function of s.
In the next subsection, we test this approach as follows.First, we create a ground truth by starting with a node sampling.We then binarize the pairwise distance information using a K nearest neighbour construction.We consider two different sampling settings; first from the standard normal distribution input : Similarity (adjacency) unweighted matrix, W ∈ R N ×N ; trial embedding dimension, s output: twoNN dimension estimate, d ⋆ ∆ ← graph Laplacian obtained from W ; g [1] , . . ., g [k] ← k eigenvectors associated to the k first non-trivial eigenvalues of ∆; y [1] , . . ., y [N ] ← N embedding locations from (1); for i = 1 : N do r 1 i , r 2 i ← 1st and 2nd nearest neighbour distances for y [i] ; Algorithm 1: Combination of spectral embedding and twoNN that can be used iteratively to estimate the intrinsic dimension of an unweighted network.
on R d and second using components that are uniform on [0, 1].The case of binarization via a geometric graph construction is considered in subsection 5.2.

Results for K Nearest Neighbour Construction
Given K, let A ∈ R N ×N be the adjacency matrix of a K nearest neighbour (KNN) graph constructed as follows.Associate x [i] with a vector in R d with independently chosen entries and record an edge between every distinct pair of nodes if one of them is among the K nearest neighbours of the other.There exist alternative constructions for building a K nearest neighbour graph, see, for example, [20], all yielding a symmetric affinity matrix, and we expect the computational results and choices of parameters to be equivalent for these alternative constructions.
It is known, [29,1], that unless K = Ω(log N ), the K nearest neighbour graph (any construction) is almost surely not connected for N sufficiently large.It is also known, [7], that one must choose K = ω(log N ) in order for the discrete graph Laplacian to converge spectrally to the underlying continuous Laplace-Beltrami operator, whose spectrum characterizes the geometry of the underlying sampling domain.If this condition is not satisfied, the eigenvectors of the discrete Laplacian are not guaranteed to converge to the eigenfunctions of the associated Laplace-Beltrami operator; hence the spectral embedding in (1) is not guaranteed to be accurate.Based on this observation, it makes sense to choose K = ω(log N ).In our experiments, we fix d = 25, N = 3, 000, and choose K := ⌊30 log N ⌋.
Figure 4 shows the estimation of the intrinsic dimension from Algorithm 1, for each embedding dimension in {15, 16, . . ., 30}, in the case where we sample N = 3000 points from a Gaussian in R 25 .We see that the slope increases and stabilises around 25. Figure 5 shows that the corresponding graph Laplacian has a spectral jump at d = 25.In the next section we will show that this spectral information degrades in the presence of noise.
We obtain similar pictures in the case where we sample the data point components uniformly in [0, 1], as indicated in Figures 6 and 7.

Geometric Graph Constructions and the Curse of Dimensionality
For random geometric graphs where edges are inserted when the pairwise distance is below a radius r > 0, there are well-known results on connectivity; see, for example, [23,24], which provide asymptotic conditions.However, choosing r suitably in practice remains challenging, as we are typically in a non-asymptotic regime, and becomes unfeasible as the intrinsic dimension of the data becomes large.This is an example of what is commonly referred to as the curse of dimensionality.We illustrate the effect with the following geometric construction, which helps to explain what may go wrong when the dimension is large.
. Let e 1 , . . ., e d be orthonormal vectors in the d directions spanned by the axes of R d .The rectangle R has exactly two faces per direction ((d − 1)-dimensional faces), and we can assume, without loss of generality, that the interval [0, 1] of R is along the e 1 direction, so that the two faces in the e 1 direction are squares contained in two faces of the unit cube.These faces can be written as where ϵ ∈ {0, 1}.Letting k ∈ {2, . . ., d}, the two faces associated to the e k direction can be written as where γ ∈ {r, 1 − r} is the constant value of the kth coordinate of every element of a given face in the e k direction.
For a direction k ∈ {2, . . ., d} and a face in the e k direction given by the choice of γ ∈ {r, where Each such rectangle "closes the gap between two adjacent corners of the unit cube", such that the set obtained, after removing from the unit cube R and every R k,γ , k ∈ {2, . . ., d}, γ ∈ {r, 1 − r}, consists of 2 d−1 connected components.Furthermore, the volume of the union of these sets is which gets very small very quickly as the dimension d increases, even for small values of r.For instance, for d = 10 and r = 0.1, this volume is 0.135, for d = 15, it is 0.0440, and for d = 25, it is 0.0047.Thus, unless the dimension of the data set is small, most of the sampled points will be equi-distributed among the 2 d−1 connected components of the unit cube after having removed R ∪ ∪ k,γ R k,γ , which as a union do not form a connected set.It is not possible to create a connected geometric graph from such a sample, unless we choose r > 0 trivially to be so large that its size is comparable to the side width of the unit cube, in which case, the generated graph will be close to being complete.This issue does not arise with the KNN construction, since the number of neighbours is fixed and determined by the choice of K, which allows us to choose K such that the graph is both connected and sufficiently sparse, even in high dimensions.
This thought experiment also indicates that, in our context, it is unreasonable to assume that a high-dimensional unweighted graph arose from a random geometric graph construction-sparsity and connectivity are unlikely to hold simultaneously.A KNN construction is thus a more realistic binarization mechanism.

Observations on non-uniform sampling densities and KNN constructions
Neither the uniform distribution on the unit cube nor the Gaussian distribution on R d are uniform in the underlying metric space (here R d with the Euclidean distance in both cases).Hence the sampling density will not be approximately constant in the neighbourhood of every sampled point, a condition that was assumed in the derivation of the twoNN algorithm in [13].
In the case of the uniform distribution on the unit cube, the sampling density fails to be approximately constant around points close to the boundary, while in the case of a Gaussian distribution, the sampling density is never approximately constant, in any neighbourhood.However, the KNN graph construction allows us to cancel to some extent the negative effect of the non-uniformity of the sampling densities, as observed by the accurate estimations of the intrinsic dimension of the data in Section 5.1.By analogy with a geometric graph, where points are connected if they are at a distance less than a fixed bandwidth parameter r > 0, a KNN graph can be thought of as connecting points if they are at a distance less than a varying bandwidth parameter, inversely proportional to the values taken by the density in a given neighbourhood of those points.Every point is connected to roughly the same number of neighbours, mimicking a geometric graph if the points had been sampled from a uniform density, canceling the presence of irregularities on the sampling domain, such as boundaries.We can thus expect the spectrally embedded points in twoNN to be approximately uniformly distributed in the case of a KNN construction, even if the points were sampled from a non-uniform density on the original domain.
We test this hypothesis by looking at the distribution of the embedded points {y [i] | i ∈ {1, . . ., n}} in R s , s > d, in the case where the points are uniformly sampled from the unit cube in [0, 1] d .In the figures below, we choose d = 25 and s = 30.If the sampling density is uniform, then the nearest neighbour distances of the sampled points should concentrate highly around their expected value, forming an inverse exponential distribution centered around the expected nearest distance.In Figure 8, we plot and compare the nearest distances for all embedded points.We observe that the nearest neighbour distances are indeed highly concentrated around their expected value, which does suggest that the embedded points in the image space must be approximately uniformly distributed.In Figure 9, the approximate uniform distribution of the embedded points is confirmed by the observation that the the nearest neighbour distances concentrate around their expected value following a Gaussian-like distribution.

Tests on Noisy Data
The experiments in section 5 assume that the similarity matrix is recorded with perfect information.In that idealized setting, we found that the twoNN algorithm did not provide a better method to infer the intrinsic dimension of the data than a simple spectral gap reading.In this section we test a more realistic scenario where the information is recorded with noise.
For every entry of the upper triangular part of the binary adjacency matrix A, we change independently the value from 0 to 1 or from 1 to 0 with probability p, and otherwise leave it unchanged.To keep the matrix A symmetric, the upper triangular part of A determines the remaining entries of  A. In other words, each edge and each missing edge is flipped independently with probability p.In our experiment, we used intrinsic dimension d = 15 and sampled N = 17000 vectors uniformly at random from the unit cube [0, 1] d .We generated A from a KNN construction, picking K = ⌊30 log(N )⌋ as before, and introduced noise to A as described above with parameter p = 0.01.
With this addition of noise, Algorithm 1 gives a reasonable estimate of the intrinsic dimension d = 15, as shown in Figure 10.In Figure 11 we see, however, that the spectrum of the Laplacian does not give compelling evidence.Intuitively, the twoNN construction should be less sensitive to the presence of a small percentage of missing or spurious edges, since these perturbations will not affect the majority of the first or second pairwise distances.

A Test on Real Data
We finish with a test on MNIST image data [19].Here each data point in R 784 represents the 28 × 28 greyscale pixel values of a handwritten digit, from 0 to 9. We uploaded 5,000 images using digitTrain4DArrayData in MATLAB [22], where each image has been arbitrarily rotated.Figure 12 shows the Laplacian spectrum for the similarity matrix based on the reciprocal of Euclidean distance.We then constructed the binarized K = 20 nearest neighbour graph.Figure 13 shows the corresponding spectrum.We see that neither spectrum plot shows a definitive spectral gap, and the two plots are not consistent.
Applying twoNN to the Euclidean distance data gave a dimension estimate of d ⋆ = 3.37.For the K = 20 nearest neighbour version of the data, Figure 14 shows the results from Algorithm 1 as the embedding dimension s varies from 5 to 20.We see that on this binarized data, it is possible to use twoNN in a way that gives an estimate that is consistent with the result on the original pairwise distance data.
We emphasize that the aim of this test was not to produce a definitive result for the underlying dimension of MNIST.This issue has been tackled in a number of works, including [15,25], with answers that depend on the way that the concept of dimension is introduced.We aimed to test instead whether Algorithm 1 allows twoNN to remain consistent under K nearest neighbour binarization.We note however, that the MNIST data can be

Conclusions
The twoNN algorithm in [13] gives a computationally efficient way to estimate the dimension of a data cloud, assuming the points are samples from a continuous manifold.The algorithm requires only first and second nearest neighbour distances; for a weighted (undirected) network this information is immediately available and we found that the algorithm performed well on examples where a ground truth is available and where information from the Laplacian spectrum was not useful.For unweighted networks, where edges are either present or absent, the algorithm can no longer be applied directly.However, we showed that consistent estimates of the dimension can be recovered by spectrally embedding into successively higher dimensional Euclidean space and applying twoNN at each stage.We also found that this approach was more robust to noise than the direct use of the Laplacian spectrum, and more consistent under K nearest neighbour binarization.
Overall, these results highlight and extend the usefulness of twoNN in the context of network analysis.

Figure 1 :
Figure 1: Ordered eigenvalues of the Laplacian based on inverse pairwise Euclidean distances between data clouds of N = 3, 000 points in R 25 .Two independent instances are shown.Interior plots focus on the first 40 nonzero eigenvalues.The spectrum gives no obvious argument for an embedding in dimension 25.

Figure 2 :
Figure 2: TwoNN estimates for the embedding dimension.We show dimension estimates d i in (2) versus the index i.

Figure 3 :
Figure 3: As for Figure 2, but with the number of data points reduced from N = 3000 to N = 300.

Figure 4 :
Figure 4: Results from Algorithm 1 with dimension s varying from 15 to 30 on the horizontal axis.Here we sampled N = 3000 points from a Gaussian in R 25 and used a KNN construction to produce an unweighted graph.The red dots, blue dots and asterisks show the minimum, maximum and mean d i value for each s.

Figure 5 :
Figure 5: First thirty ordered nonzero eigenvalues for the Laplacian of an unweighted KNN graph from the experiment in Figure 4, showing a spectral jump at dimension d = 25.

Figure 8 :
Figure8: Nearest neighbour distance for each of the 3000 points uniformly sampled from the unit cube in R 25 , spectrally embedded in R 30 , obtained from a KNN construction.We observe that the nearest neighbour distances concentrate tightly around an expected distance.This suggests the embedded points are close to being uniformly distributed, in which case their nearest neighbour distances would follow an approximately normal distribution.

Figure 9 :
Figure 9: Sampling density of the nearest distance neighbours from the spectral embedding of dimension 30, from points sampled uniformly from the unit cube in R 25 , obtained from a KNN construction as in Figure 8.The horizontal axis indicates the value of the nearest neighbour distance, and the vertical axis indicates the number of points whose nearest neighbour distance satisfies this value.The normal-like distribution suggests that the points in the spectral embedding are approximately uniformly distributed.

Figure 10 :
Figure 10: Results from Algorithm 1 with dimension s varying from 5 to 30 on the horizontal axis.Here we sampled N = 17000 points from the unit cube [0, 1] 15 , used a KNN construction to produce an unweighted graph and then flipped each edge/missing edge with independent probability p = 0.01.The red dots, blue dots and asterisks show the minimum, maximum and mean d i value for each s.

Figure 11 :
Figure 11: First thirty ordered eigenvalues for the Laplacian of an unweighted KNN graph from the experiment in Figure 10 based on data points from [0, 1] 15 .

Figure 12 :
Figure 12: Ordered nonzero eigenvalues of the Laplacian based on inverse pairwise Euclidean distances between N = 5, 000 handwritten digits from 0 to 9. Interior plot zooms in on the smallest eigenvalues.

Figure 13 :
Figure 13: Ordered nonzero eigenvalues of the Laplacian based on K = 20 nearest neighbour graph from N = 5, 000 handwritten digits from 0 to 9. Interior plot zooms in on the smallest eigenvalues.

Figure 14 :
Figure 14: Results from Algorithm 1 with dimension s varying from 5 to 30 on the horizontal axis.Here we used the same nearest neighbour graph as in Figure 13.