Distance Metric Learning using Graph Convolutional Networks Application to Functional Brain Networks

24 Jul 2017

Paper: arxiv
Code: github (tensorflow)
Submitted: 7 Mar 2017

Key Idea:

we propose a novel method for learning a similarity metric between irregular graphs with known node correspondences.

Backgroung knowledge:

Siamese neural network is a class of neural network architectures that contain two or more identical subnetworks. identical here means they have the same configuration with the same parameters and weights. Parameter updating is mirrored across both subnetworks. Siamese NNs are popular among tasks that involve finding similarity or a relationship between two comparable things.

Degree matrix or diagonal degree matrix

The degree matrix is a diagonal matrix which contains information about the degree of each vertex—that is, the number of edges attached to each vertex.

Adjacency matrix

An adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.

Laplacian matrix / Symmetric normalized Laplacian

Chebyshev polynomials of the first kind

Methodology:

Check this blog

Dataset & preprocess:

Dataset: Autism Brain Imaging Data Exchange (ABIDE)
Preprocess pipeling: Configurable Pipeline for the Analysis of Connectomes (C-PAC)

  
  Including:
    * skull striping
    * slice timing correction
    * motion correction
    * global mean intensity normalisation 
    * nuisance signal regression 
    * band-pass filtering (0.01-0.1Hz)
    * registration of fMRI images to standard anatomical space (MNI152)

ROI:
- Harvard Oxford (HO) atlas (R = 110 cortical and subcortical ROIs)
- Extract the mean time series for ROI
- Normalised to zero mean and unit variance.
Number:

  
  Subjects number: N = 871 
  ASD disease: 403 
  Healthy controls: 468 
  Sites number: 20
  (from different imaging sites, 871 met the imaging quality and phenotypic information criteria)

Network detail:

Task: measure the similarity between two graph
Graph:
- Vertex: Each ROI is represent by a node $\mathcal{v}_i\in\mathcal{V}$
- Input feature: for each ROI, the input feature is the corresponding row of correlation matrix for that ROI.
- Edge & weight:
  - In their paper, they claim that they use $e_{ij}=d(v_i,v_j)=\sqrt{\|v_i-v_j\|^2}$ for weight
  - In their code, they used $W_{ij} = \begin{cases} \exp(-\frac{[dist(i,j)]^2}{2\theta^2}), & \text{if $dist(i,j)\le\mathcal{k}$} \\ 0, & \text{otherwise} \end{cases}$ for weight ( $\theta, \mathcal{k}$ are some parameters)
  - The edge is determined by k-NN (k-nearest neighbors).
Network Structure:
1. CNN:
  1. 2 layers with 64 features (shared in Siamese network)
  2. K=3, convolution takes input at most K steps away from a node.
2. FC:
  1. One output with Sigmoid activation $S(x)={\frac {1}{1+e^{-x}}}={\frac {e^{x}}{e^{x}+1}}.$
  2. A binary feature is introduced at the FC layer indicating whether the subject pair were scanned at the same site or not.
  3. Dropout 0.2 on FC
Loss function:

$J^g=(\sigma^{2+}+\sigma^{2-})+\lambda max(0,m-(\mu^+-\mu^-))$

It maximises the mean similarity $\mu^+$ between embeddings belonging to the same class, minimises the mean similarity between embeddings belonging to different classes $\mu^-$ . And minimises the variance of pairwise similarities for both matching $\sigma^{2+}$ and non-matching $\sigma^{2-}$ pairs of graphs.

Network detail:
- Adam optimizer: 0.001 learning rate and 0.005 regularization
- Loss function: margin m=0.6, weight lambda=0.35
- mini-batch: 200
Train and test:
1. 871 total, 720 train, 151 test.
2. train form 21802 matching and 21398 non-matching graph pairs. test form 5631 matching and 5694 non-matching.
3. all graphs are fed to the network the same number of times to avoid biases.
4. subjects from all 20 sites are included in both training and test sets

Results:

In order to demonstrate the learned metric’s ability to facilitate a subject classification task (ASD vs control), we use a simple k-nn classifier with k = 3 based the estimated distances. Improvement in classification scores reaches 11.9% on the total test set and up to 40% for individual sites (Compared with PCA/Euclidean distance).

Personal thought:

What is the point of Siamese network? Why not just a classifier with GCN and FC (or maybe the performance is not good in this way?)?

Revisiting the Unreasonable Effectiveness of Data

23 Jul 2017

Paper: arxiv
Blog post: blog link

Key idea:

We believe that, although challenging, obtaining large scale task-specific data should be the focus of future study.

Some points:

Better Representation Learning Helps.
Performance increases linearly with orders of magnitude of training data.
Capacity is Crucial. (network capacity need to be large to learn more data)
New state of the art results. (a single model (without any bells and whistles) can now achieve 37.4 AP as compared to 34.3 AP on the COCO detection benchmark.)

Useful Article Archive

22 Jul 2017

This is a archive for some useful article (blog post):

Deep Learning

The 9 Deep Learning Papers You Need To Know About: summarize 9 fundamental papers ((AlexNet 2012), (ZF Net 2013), (VGG Net 2014), (GoogLeNet 2015), (ResNet 2015), (Region Based CNNs 2015), (GAN 2014), (Generating Image Descriptions 2014), (Spatial Transformer Networks 2015))
37 Reasons why your Neural Network is not working

rsync

Hello World!

19 Jul 2017

This is the blog for recording my thoughts and paper reading.

Deep Paper Pool really deep.