Spectral Graph Theory studies graphs using associated matrices such as the adjacency matrix and graph Laplacian. Let (G(V, E)) be a graph. We’ll let (n = |V|) denote the number of vertices/nodes, and (m = |E|) denote the number of edges. We’ll assume that vertices are indexed by (0,dots,n-1), and edges are indexed by (0,dots,m-1).
The adjacency matrix(A) is a (ntimes n) matrix with (A_{i,j} = 1) if ((i,j) in E) is an edge, and (A_{i,j} = 0) if ((i,j) notin E). If (G) is an undirected graph, then (A) is symmetric. If (G) is directed, then (A) need not be symmetric.
- In mathematics, spectral graph theory is the study of the properties of a graph in relationship to the characteristic polynomial, eigenvalues, and eigenvectors of matrices associated with the graph, such as its adjacency matrix or Laplacian matrix. The adjacency matrix of a simple graph is a real symmetric matrix and is therefore orthogonally diagonalizable; its eigenvalues are real algebraic.
- For spectral graph theory try these:. Algebraic Graph Theory by Godsil and Royle. Spectra of Graphs by Brouwer and Haemers. Spectral Graph Theory by Chung. Eigenspaces of Graphs by Cvetkovic, Rowlinson and Simic.
Spectral graph drawing: FEM justification If apply finite element method to solve Laplace’s equation in the plane with a Delaunay triangulation Would get graph Laplacian, but with some weights on edges Fundamental solutions are x and y coordinates (see Strang’s Introduction to Applied Mathematics). Spectral Graph Theory and its Applications Yi-Hsuan Lin Abstract This notes were given in a series of lectures by Prof. Fan Chung in National Taiwan University. 1 Introduction 1.1 Basic notations Let G= (V;E) be a graph, where V is a vertex set and Eis an edge set. (Graph 1) We denote the edge set E= ffa;bg;fb;cg;g. Spectral Graph Theory book. Read 3 reviews from the world's largest community for readers. Based on 10 lectures given at the CBMS workshop on spectral gr.
The degree of a node (i), (deg(i)) is the number of neighbors of (i), meaning the number of edges which (i) participates in. You can calculate the vector of degrees (a vector (d) of length (n), where (d_i = deg(i))), using matrix-vector mulpilication:begin{equation}d = A 1end{equation}where (1) is the vector containing all 1s of length (n). You could also just sum the row entries of (A). We will also use (D = diag(d)) - a diagonal matrix with (D_{i,i} = d_i).
The incidence matrix(B) is a (n times m) matrix which encodes how edges and vertices are related. Let (e_k = (i,j)) be an edge. Then the (k)-th column of (B) is all zeros except (B_{i,k} = -1), and (B_{j,k} = +1) (for undirected graphs, it doesn’t matter which of (B_{i,k}) and (B_{j,k}) is (+1) and which is (-1) as long as they have opposite signs).Note that (B^T) acts as a sort of difference operator on functions of vertices, meaning (B^T f) is a vector of length (m) which encodes the difference in fuction value over each edge.
You can check that (B^T 1_C = 0), where (1_C) is a connected component indicator ((1_C[i] = 1) if (i in C), and (1_C[i] = 0) otherwise). (Csubseteq V) is a connected component of the graph if all vertices in (C) have a path between them, and there are no vertices in (V) that are connected to (C) which are not in (C). This implies (B^T 1 = 0).
The graph laplacian(L) is an (n times n) matrix (L = D- A = B B^T). If the graph lies on a regular grid, then (L = -Delta) up to scaling by a finite difference width (h^2), but the graph laplacian is defined for all graphs.
Note that the nullspace of (L) is the same as the nullspace of (B^T) (the span of indicators on connected components).
In most cases, it makes sense to store all these matrices in sparse format.
Exercise¶
Spectral Graph Theory Textbook
For an undirected graph (G(V, E)), let (n = |V|) and (m = |E|). Give an expression for the number of non-zeros in each of (A), (B), and (L) in terms of (n) and (m).
(A) and (B) both have (2m) non-zeros. (L) has (n + 2m) non-zeros.
Random Walks on Graphs¶
In a random walk on a graph, we consider an agent who starts at a vertex (i), and then will chose a random neighbor of (i) and “walk” along the connecting edge. Typically, we will consider taking a walk where a neighbor is chosen uniformly at random (i.e. with probability (1/d_i)). We’ll assume that every vertex of the graph has at least one neighbor so (D^{-1}) makes sense.
This defines a Markov Chain with transition matrix (P = A D^{-1}) (columns are scaled to 1). Note that even if (A) is symmetric (for undirected graphs) that (P) need not be symmetric because of the scaling by (D^{-1}).
The stationary distribution (x) of the random walk is the top eigenvector of (P), is guaranteed to have eigenvalue (1), and is guaranteed to have non-negative entries. If we scale (x) so (|x|_1 = 1), The entry (x_i) can be interpreted as the probability that a random walker which has walked for a very large number of steps is at vertex (i).
Page Rank¶
PageRank is an early algorithm that was used to rank websites for search engines. The internet can be viewed as a directed graph of websites where there is a directed edge ((i, j)) if webpage (j) links to webpage (i). In this case, we compute the degree vector (d) using the out-degree (counting the number of links out of a webpage). Then the transition matrix (P = A D^{-1}) on the directed adjacency matrix defines a random walk on webpages where a user randomly clicks links to get from webpage to webpage. The idea is that more authoritative websites will have more links to them, so a random web surfer will be more likely to end up with them.
One of the issues with this model is that it is easy for a random walker to get “stuck” at a webpage with no out-going links. The idea of PageRank is to add a probability (alpha) that a web surfer will randomly go to another webpage which is not linked to by their current page. In this case, we can write the transition matrixbegin{equation}P = (1-alpha) A D^{-1} + frac{alpha}{n} 11^Tend{equation}We then calculate the stationary vector (x) of this matrix. Websites with a larger entry in (x_i) are deemed more authoritative.
Note that because (A) is sparse, you’ll typically want to encode (frac{1}{n}11^T) as a linear operator (this takes the average of a vector, and broadcasts it to the appropriate shape). For internet-sized graphs this is a necessity.
Let’s look at the Les Miserables graph, which encodes interactions between characters in the novel Les Miserables by Victor Hugo.
Let’s now construct the PageRank matrix and compute the top eigenpairs
The Graph Laplacian¶
Spectral Embeddings¶
Spectral embeddings are one way of obtaining locations of vertices of a graph for visualization. One way is to pretend that all edges are Hooke’s law springs, and to minimize the potential energy of a configuration of vertex locations subject to the constraint that we can’t have all points in the same location.
In one dimension:begin{equation}mathop{mathsf{minimize}}x sum{(i,j) in E} (x_i - x_j)^2text{subject to } x^T 1 = 0, |x|_2 = 1end{equation}
Spectral Graph Theory Book Pdf
Note that the objective function is a quadratic form on the embedding vector (x):begin{equation}sum_{(i,j)in E} (x_i - x_j)^2 = x^T B B^T x = x^T L xend{equation}
Because the vector (1) is in the nullspace of (L), this is equivalent to finding the eigenvector with second-smallest eigenvalue.
For a higher-dimensional embedding, we can use the eigenvectors for the next-largest eigenvalues.
Attention: the first formula is not shown in the current notebook!
Spectral Clustering¶
Spectral clustering refers to using a spectral embedding to cluster nodes in a graph. Let (A, B subset V) with (A cap B = emptyset). We will denotebegin{equation}E(A, B) = {(i,j) in E mid iin A, jin B}end{equation}
One way to try to find clusters is to attempt to find a set of nodes (S subset V) with (bar{S} = V setminus S), so that we minimize the cut objectivebegin{equation}C(S) = frac{|E(S, bar{S})|}{min {|S|, |bar{S}|}}end{equation}
The Cheeger inequality bounds the second-smallest eigenvalue of (L) in terms of the optimal value of (C(S)). In fact, the way to construct a partition of the graph which is close to the optimal clustering minimizing (C(S)) is to look at the eigenvector (x) associated with the second smallest eigenvalue, and let (S = {i in V mid x_i < 0}).
As an example, let’s look at a graph generated by a stochastic block model with two clusters. The “ground-truth” clusters are the ground-truth communities in the model.
Now, let’s use spectral clustering to partition into two clusters
We’ll use the adjusted rand index to measure the quality of the clustering we obtained. A value of 1 means that we found the true clusters.
In general, you should use a dimension (d) embedding when looking for (d+1) clusters (so we used a dimension 1 embedding for 2 clusters). Let’s look at 4 clusters in a SBM
we’ll use K-means clustering in scikit learn to assign clusters.
Exercise¶
We’ll consider the stochastic block model with (k=5) clusters and (n=25) nodes per cluster.
Let (p) be denote the probability of an edge between nodes in the same cluster, and (q) denote the probability of an edge between nodes in different clusters.
Plot a phase diagram of the adjusted rand index (ARI) score as (p) and (q) both vary in the range ([0,1])