Distance Metric Learning using Graph Convolutional Networks Application to Functional Brain Networks

24 Jul 2017

Paper: arxiv
Code: github (tensorflow)
Submitted: 7 Mar 2017

Key Idea:

we propose a novel method for learning a similarity metric between irregular graphs with known node correspondences.

Backgroung knowledge:

Siamese neural network is a class of neural network architectures that contain two or more identical subnetworks. identical here means they have the same configuration with the same parameters and weights. Parameter updating is mirrored across both subnetworks. Siamese NNs are popular among tasks that involve finding similarity or a relationship between two comparable things.

Degree matrix or diagonal degree matrix

The degree matrix is a diagonal matrix which contains information about the degree of each vertex—that is, the number of edges attached to each vertex.

Adjacency matrix

An adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.

Laplacian matrix / Symmetric normalized Laplacian

Chebyshev polynomials of the first kind

Methodology:

Check this blog

Dataset & preprocess:

Dataset: Autism Brain Imaging Data Exchange (ABIDE)
Preprocess pipeling: Configurable Pipeline for the Analysis of Connectomes (C-PAC)

  
  Including:
    * skull striping
    * slice timing correction
    * motion correction
    * global mean intensity normalisation 
    * nuisance signal regression 
    * band-pass filtering (0.01-0.1Hz)
    * registration of fMRI images to standard anatomical space (MNI152)

ROI:
- Harvard Oxford (HO) atlas (R = 110 cortical and subcortical ROIs)
- Extract the mean time series for ROI
- Normalised to zero mean and unit variance.
Number:

  
  Subjects number: N = 871 
  ASD disease: 403 
  Healthy controls: 468 
  Sites number: 20
  (from different imaging sites, 871 met the imaging quality and phenotypic information criteria)

Network detail:

Task: measure the similarity between two graph
Graph:
- Vertex: Each ROI is represent by a node $\mathcal{v}_i\in\mathcal{V}$
- Input feature: for each ROI, the input feature is the corresponding row of correlation matrix for that ROI.
- Edge & weight:
  - In their paper, they claim that they use $e_{ij}=d(v_i,v_j)=\sqrt{\|v_i-v_j\|^2}$ for weight
  - In their code, they used $W_{ij} = \begin{cases} \exp(-\frac{[dist(i,j)]^2}{2\theta^2}), & \text{if $dist(i,j)\le\mathcal{k}$} \\ 0, & \text{otherwise} \end{cases}$ for weight ( $\theta, \mathcal{k}$ are some parameters)
  - The edge is determined by k-NN (k-nearest neighbors).
Network Structure:
1. CNN:
  1. 2 layers with 64 features (shared in Siamese network)
  2. K=3, convolution takes input at most K steps away from a node.
2. FC:
  1. One output with Sigmoid activation $S(x)={\frac {1}{1+e^{-x}}}={\frac {e^{x}}{e^{x}+1}}.$
  2. A binary feature is introduced at the FC layer indicating whether the subject pair were scanned at the same site or not.
  3. Dropout 0.2 on FC
Loss function:

$J^g=(\sigma^{2+}+\sigma^{2-})+\lambda max(0,m-(\mu^+-\mu^-))$

It maximises the mean similarity $\mu^+$ between embeddings belonging to the same class, minimises the mean similarity between embeddings belonging to different classes $\mu^-$ . And minimises the variance of pairwise similarities for both matching $\sigma^{2+}$ and non-matching $\sigma^{2-}$ pairs of graphs.

Network detail:
- Adam optimizer: 0.001 learning rate and 0.005 regularization
- Loss function: margin m=0.6, weight lambda=0.35
- mini-batch: 200
Train and test:
1. 871 total, 720 train, 151 test.
2. train form 21802 matching and 21398 non-matching graph pairs. test form 5631 matching and 5694 non-matching.
3. all graphs are fed to the network the same number of times to avoid biases.
4. subjects from all 20 sites are included in both training and test sets

Results:

In order to demonstrate the learned metric’s ability to facilitate a subject classification task (ASD vs control), we use a simple k-nn classifier with k = 3 based the estimated distances. Improvement in classification scores reaches 11.9% on the total test set and up to 40% for individual sites (Compared with PCA/Euclidean distance).

Personal thought:

What is the point of Siamese network? Why not just a classifier with GCN and FC (or maybe the performance is not good in this way?)?

Deep Paper Pool really deep.

Distance Metric Learning using Graph Convolutional Networks Application to Functional Brain Networks

Key Idea:

Backgroung knowledge:

Methodology:

Dataset & preprocess:

Network detail:

Results:

Personal thought:

Related Posts

The Elephant in the Room 07 Sep 2018

Metric learning with spectral graph convolutions on brain connectivity networks 04 Jan 2018

LSTM Time and Frequency Recurrence for Automatic Speech Recognition 29 Oct 2017