Graph Convolution Basic

13 Aug 2017

Graph

$\begin{align} \text{Graph} \ &G=\{\mathcal{V}, \mathcal{E}, \mathbf{W}\} \\ &\mathcal{V} \text{, set of vertices}\\ &\mathcal{E} \text{, set of edges}\\ &\mathbf{W} \text{, weighted adjacency matrix} \end{align}$

Edge $e=(i,j)$ connects vertices $v_i$ and $v_j$ . It has weight of $\mathbf{W}_{ij}$

$\begin{align} &N \text{, number of vertices, }|\mathcal{V}|=N\\ &\mathbf{D} \text{, diagonal degree matrix}, D_{ii}=\sum_j\mathbf{W}_{ij}\\ &\mathcal{L}=\mathbf{D}-\mathbf{W} \text{, Non normalized graph laplacian}\\ &(\mathcal{L} f)(i) = \sum_{j\in N_i} \mathbf{W_{ij}}[f(i)-f(j)]\\ &\tilde{\mathcal{L}}=\mathbf{D}^{-1/2}\mathcal{L}\mathbf{D}^{-1/2}=\mathbf{I_N}-\mathbf{D}^{-1/2}\mathbf{W}\mathbf{D}^{-1/2} \text{, Normalized graph laplacian}\\ &(\tilde{\mathcal{L}} f)(i) = \frac{1}{\sqrt{d_i}}\sum_{j\in N_i} \mathbf{W_{ij}}\bigg[\frac{f(i)}{\sqrt{d_i}}-\frac{f(i)}{\sqrt{d_j}}\bigg] \end{align}$

Laplacian is a real symmetric matrix.
It has orthonormal eigenvector as $\{\mathbf{u}_l\}_{l=0,1,...,N-1}$ and eigenvalue $\{\lambda_l\}_{l=0,1,...,N-1}$ .

$\begin{align} \mathcal{L} \mathbf{u}_l &= \lambda \mathbf{u}_l\\ \mathcal{L} &= \mathbf{U}\mathbf{\Lambda}\mathbf{U}^T \text{, where } \mathbf{\Lambda}=diag(\lambda_0,...,\lambda_{N-1}), \mathbf{U}= \begin{bmatrix} | & \cdots & | \\[0.3em] \mathbf{u}_0 & \cdots & \mathbf{u}_{N-1} \\[0.3em] | & \cdots & | \end{bmatrix} \end{align}$

Assume $0=\lambda_0<\lambda_1\le\lambda_2...\le\lambda_{N-1}:=\lambda_{max}$ , we denote the entire spectrum by $\sigma(\mathcal{L}):=\{\lambda_0,\lambda_1,...,\lambda_{N-1}\}$

Graph Fourier transform

Eigenfunction of a linear operator $D$ is any non-zero function $f$ that $Df=\lambda f$ , where $\lambda$ is a scaling factor called eigenvalue.
In one dimensional space, laplacian or laplace operator is $\Delta$ :

$\begin{align} \Delta f=\nabla^2f=\nabla \nabla f \text{ where } \nabla=(\frac{\partial}{\partial x_1},...,\frac{\partial}{\partial x_n}) \end{align}$

For classical Fourier transform:

$\begin{align} \hat f(\xi)=<f,e^{2\pi i\xi t}>=\int_\mathbb{R}f(t)e^{-2\pi i\xi t}dt \end{align}$

where $e^{2\pi i\xi t}$ is eigenfunction of $\Delta$ , since,

$\begin{align} -\Delta (e^{2\pi i\xi t})=-\frac{\partial^2}{\partial t^2}e^{2\pi i\xi t}=(2\pi \xi)^2 e^{2\pi i\xi t} \end{align}$

For graph, we can define graph Fourier transform $\hat f$ as,

$\begin{align} \hat f(\lambda_l):=<\mathbf{f},\mathbf{u}_l>=\sum_{i=1}^N f(i) u^*_l(i) \end{align}$

and inverse graph Fourier transform as,

$\begin{align} f(i)=\sum_{l=0}^{N-1} \hat f(\lambda_l) u_l(i) \end{align}$

Graph Spectral Filtering

In classical signal processing, filter is:

$\begin{align} \hat f_{out} (\xi)=\hat f_{in} (\xi) \hat h(\xi) \end{align}$

where $\hat h(\cdot)$ is transfer function of this filter. By inverse Fourier transform,

$\begin{align} f_{out}(t)&=\int_{\mathbb R} \hat f_{in}(\xi) \hat h(\xi) e^{2\pi i\xi t}d\xi\\ &= \int_{\mathbb R} f_{in}(\tau)h(t-\tau)d\tau =: (f_{in}*h)(t) \end{align}$

For graph, we define graph spectral (frequency) filtering as,

$\begin{align} \hat f_{out} (\lambda_l)=\hat f_{in} (\lambda_l) \hat h(\lambda_l) \end{align}$

by inverse graph Fourier transform,

$\begin{align} f_{out}(i)=\sum_{l=0}^{N-1} \hat f_{in} (\lambda_l) \hat h(\lambda_l) u_l(i) \end{align}$

With some matrix manipulation and orthonormality, we can get:

$\begin{align} \mathbf{f}_{out}=\hat h(\mathcal{L})\mathbf{f}_{in} \text{, where } \hat h(\mathcal{L}):=\mathbf{U} \begin{bmatrix} \hat h(\lambda_0) & & \mathbf{0} \\[0.3em] & \ddots & \\[0.3em] \mathbf{0} & & \hat h(\lambda_{N-1}) \end{bmatrix} \mathbf{U}^T \end{align}$

Based on equation (20), we also define (22) or (23) as convolution on graph.

Chebyshev polynomial expansion

Note: we are using normalized graph laplacian now.
Assume we have a filter $g_\theta = diag(\theta)$ , parameterized by $\theta \in \mathbb R^N$

$\begin{align} y=g_\theta*x=\mathbf{U}g_\theta\mathbf{U}^Tx \end{align}$

In order to calculate this, we need calculate $\mathbf{U}g_\theta\mathbf{U}^T. ~ (O(N^2))$
To avoid this, we can treat $g_\theta$ as a function of $\mathbf{\Lambda}$ , and approximate it by truncated expansion in terms of Chebyshev polynomial $T_k(x)$ up to $K^{th}$ order:

$\begin{align} g_{\theta'}(\mathbf\Lambda)\approx\sum_{k=0}^K \theta'_k T_k (\tilde{\mathbf\Lambda}) \end{align}$

with rescaled $\tilde{\mathbf\Lambda} = \frac{2}{\mathbf{\lambda}_{max}}\mathbf{\Lambda}-I_N$ , $\theta'_k$ as vector of Chebyshev coefficients, and:

$\begin{align} T_{0}(x)&=1\\T_{1}(x)&=x\\T_{n+1}(x)&=2xT_{n}(x)-T_{n-1}(x). \end{align}$ $\begin{align} \mathbf{U}g_{\theta'}(\mathbf\Lambda)\mathbf{U}^T \approx \sum_{k=0}^K \theta'_k \mathbf{U} T_k (\tilde{\mathbf\Lambda}) \mathbf{U}^T = \sum_{k=0}^K \theta'_k T_k (\tilde{\mathcal{L}}) \text{, where } \tilde{\mathcal{L}} = \frac{2}{\mathbf{\lambda}_{max}}\mathcal{L}-I_N \end{align}$

since $(\mathbf{U}\mathbf{\Lambda}\mathbf{U}^T)^k=\mathbf{U}\mathbf{\Lambda}^k\mathbf{U}^T$
Here we have:

$\begin{align} g_{\theta'}*x \approx \sum_{k=0}^K \theta'_k T_k (\tilde{\mathcal{L}}) x \end{align}$

Noted this expression is K-localized since it is a $K^{th}$ order polynomial of laplacian.

Inception v4, Inception-ResNet and the Impact of Residual Connections on Learning

03 Aug 2017

Paper: arxiv

Key idea:

Here we give clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly.

Some idea:

The authors argue that residual connections are inherently necessary for training very deep convolutional models. Our findings do not seem to support this view, at least for image recognition. In the experimental section we demonstrate that it is not very difficult to train competitive very deep networks without utilizing residual connections. However the use of residual connections seems to improve the training speed greatly, which is alone a great argument for their use.

Spectral Graph Convolutions for Population based Disease Prediction

01 Aug 2017

Paper: arxiv
Code: github (tensorflow)

Key idea:

Graph Convolutional Networks (GCN) for brain analysis in populations, combining imaging and non-imaging data.

Network Outline:

Task: to assign to each acquisition, corresponding to a subject and time point, a label l ∈ L describing the corresponding subject’s disease state (e.g. control or diseased).
Vertex: We represent the population as a graph where each subject is associated with an imaging feature vector and corresponds to a graph vertex.
Edge: The graph edge weights are derived from phenotypic data, and encode the pairwise similarity between subjects and the local neighbourhood system.
population graph’s adjacency matrix W is defined as follows:

$W(v,w)=Sim(S_v,S_w)\sum_{h=1}^H \rho(M_h(v),M_h(w))$

where $Sim(S_v,S_w)$ is similarity between subjects based on image measures. $\rho$ is a measure of distance between phenotypic measures (non-imaging measures). Here is a set of H non-imaging measures $M=\{M_h\}$ (e.g. subject’s gender and age.

$\rho(M_h(v),M_h(w)) = \begin{cases} 1, & \text{if $|M_h(v)-M_h(w)|<\theta$} \\ 0, & \text{otherwise} \end{cases}$

GCN: check this blog and this paper Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Training: This structure is used to train a GCN model on partially labelled graphs, aiming to infer the classes of unlabelled nodes from the node features and pairwise associations between subjects.
Network detail:

ReLU activation after graph convolutional layer ( $max(0,x)$ )
Softmax activation in final layer $\sigma (\mathbf {z} )_{j}={\frac {e^{z_{j}}}{\sum _{k=1}^{K}e^{z_{k}}}}$
Loss: cross-entropy
Unlabelled nodes are then assigned the labels maximising the softmax output.
Dropout
l2 regularisation

Dataset Detail:

Autism Brain Imaging Data Exchange (ABIDE)

Task: classify subjects healthy or suffering from Autism Spectrum Disorders (ASD).
Objective: exploit the acquisition information which can strongly affect the comparability of subjects.
Dataset: ABIDE
Dataset Detail:
1. 871 subjects, 403 ASD and 468 healthy controls.
2. 20 different sites
3. Preprocessing pipeline from C-PAC & ROI from Harvard Oxford (HO) atlas, same as Ruckert2016
4. The individual connectivity matrices are estimated by computing the Fisher transformed Pearson’s correlation coefficient between the representative rs-fMRI timeseries of each ROI in the HO atlas.
Input feature (vertex): vectorised functional connectivity matrix. And a ridge classifier is employed to select the most discriminative features from the training set.
Adjacency matrix (edge and weight):
- $Sim(S_v,S_w)$ is the correlation distance between the subjects’ rs-fMRI connectivity networks after feature selection.
- $H=2$ non-imaging measures: subject’s gender and acquisition site

Alzheimer’s Disease Neuroimaging Initiative (ADNI)

Task: predict whether an MCI patient will convert to AD.
Objective: demonstrate the importance of exploiting longitudinal information, which can be easily integrated into our graph structure, to increase performance.
Dataset: ADNI
Dataset Detail:
1. 540 subjects (1675 samples) with early/late MCI and contained longitudinal T1 MR images, 289 subjects (843 samples) diagnosed as AD
2. Acquisitions after conversion to AD were not included.
Input feature (vertex): volumes of all 138 segmented brain structures
Adjacency matrix (edge and weight):
- $Sim(S_v,S_w)$ :
  $Sim(S_v,S_w) = \begin{cases} \lambda, & \text{if two samples correspond to the same subject ($\lambda >1$)} \\ 1, & \text{otherwise} \end{cases}$
- $H=2$ non-imaging measures: subject’s gender and age information

Results

10-fold stratified cross validation strategy used.
K = 3 order Chebyshev polynomials.
In ADNI, longitudinal acquisitions of the same subject are in the same fold.

Autism Brain Imaging Data Exchange (ABIDE)

Result: We show how integrating acquisition information allows to outperform the current state of the art on the whole dataset with a global accuracy of 69.5%.

Alzheimer’s Disease Neuroimaging Initiative (ADNI)

Result: an average accuracy of 77% on par with state of the art results, corresponding to a 10% increase over a standard linear classifier.

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

30 Jul 2017

Paper: arxiv Code: github Submitted: 30 Jun 2016

Key Idea:

We present a formulation of CNNs in the context of spectral graph theory, … design fast localized convolutional filters on graphs.

Claimed contribution:

Spectral formulation: tools from graph signal processing (GSP) on CNN
Strictly localized filters: filters strictly localized in a ball of radius K
Low computational complexity: linear complexity w.r.t the input data size n. no Fourier basis, eigenvalue decomposition and store the basis. Only need o store the Laplacian, a sparse matrix of $\|\Eulerconst\|$ non-zero values.
Efficient pooling: is analog to pooling of 1D signals by rearrangement of the vertices as a binary tree structure.
Experimental results

Background Knowledge:

Hilbert Space

A Hilbert space H is a real or complex inner product space that is also a complete metric space with respect to the distance function induced by the inner product.
wiki Inner product of functions f and g in $L^2\bigl([-1,1],dy/\sqrt{(1-y^2)}\bigr)$ is $\langle f,g\rangle=\int_{-1}^{1}f(y)\overline{g(y)}\frac{dy}{\sqrt{1-y^2}}$
Square-integrable function

In mathematics, a square-integrable function, also called a quadratically integrable function, is a real- or complex-valued measurable function for which the integral of the square of the absolute value is finite.
A space which is complete under the metric induced by a norm is a Banach space. Therefore, the space of square integrable functions is a Banach space, under the metric induced by the norm, which in turn is induced by the inner product. As we have the additional property of the inner product, this is specifically a Hilbert space, because the space is complete under the metric induced by the inner product.
some detailed Chebyshev calculation check section III part C “The Chebyshev Polynomial Approximation”

Methodology

Fast localized spectral filters

Graph coarsening

Use coarsening phase of the Graclus multilevel clustering algorithm with normalized cut as spectral clustering objectives. Graclus’ greedy rule consists, at each coarsening level, in picking an unmarked vertex i and matching it with one of its unmarked neighbors j that maximizes the local normalized cut $W_{ij}(1/d_i+1/d_j)$ . The two matched vertices are then marked and the coarsened weights are set as the sum of their weights.

Fast Pooling of Graph Signals

After coarsening, the nodes in pair would be pooled (in article they use maxpooling), the node not in pair (called singleton) is paired with a fake nodes (initialed with neutral value, e.g. 0, when using ReLU and max pooling). This is pooling of 2, pooling of 4 can be done by 2-pooling 2 times.

Result and discussion

MNIST

Use the image 2D grid for 8-NN (K-nearest neighbourhood) graph, almost same performance as normal CNN on image data (CNN vs GCNN: 99.33% vs 99.14%).
They say this may due to isotropic nature of the spectral filters, i.e. the fact that edges in a general graph do not possess an orientation (like up, down, right and left for pixels on a 2D grid).

20NEWS

Use bag-of-words and other embedding get good result.

findings

Their Spectral Filter is linear O(n) complexity compared to other filter and easily be paralleled by GPU which get 8 times speedup. better Graph Quality, better result

Sublime Text Markdown Support

25 Jul 2017

Install MarkdownEditing, Markdown Preview and LiveReload by Shift+Ctrl+P -> Install package.
Open a .md file and set Syntax to Markdown-Markdown in right bottom corner of Sublime text.
Open Preferences -> Settings - Syntax Specific add following line to Markdown.sublime-settings:
"color_scheme": "Packages/MarkdownEditing/MarkdownEditor-Dark.tmTheme"
Use the Shift+Ctrl+P -> Markdown Preview: Preview in Browser -> Github to preview the output.

Older Newer

Deep Paper Pool really deep.

Graph Convolution Basic

Graph

Graph Fourier transform

Graph Spectral Filtering

Chebyshev polynomial expansion

Inception v4, Inception-ResNet and the Impact of Residual Connections on Learning

Key idea:

Some idea:

Spectral Graph Convolutions for Population based Disease Prediction

Key idea:

Network Outline:

Dataset Detail:

Autism Brain Imaging Data Exchange (ABIDE)

Alzheimer’s Disease Neuroimaging Initiative (ADNI)

Results

Autism Brain Imaging Data Exchange (ABIDE)

Alzheimer’s Disease Neuroimaging Initiative (ADNI)

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Key Idea:

Claimed contribution:

Background Knowledge:

Methodology

Fast localized spectral filters

Graph coarsening

Fast Pooling of Graph Signals

Result and discussion

MNIST

20NEWS

findings

Sublime Text Markdown Support