# Distance Metric Learning using Graph Convolutional Networks Application to Functional Brain Networks

Paper: arxiv
Code: github (tensorflow)
Submitted: 7 Mar 2017

### Key Idea:

we propose a novel method for learning a similarity metric between irregular graphs with known node correspondences.

### Backgroung knowledge:

Siamese neural networks

Siamese neural network is a class of neural network architectures that contain two or more identical subnetworks. identical here means they have the same configuration with the same parameters and weights. Parameter updating is mirrored across both subnetworks. Siamese NNs are popular among tasks that involve finding similarity or a relationship between two comparable things.

Degree matrix or diagonal degree matrix

The degree matrix is a diagonal matrix which contains information about the degree of each vertex—that is, the number of edges attached to each vertex.

An adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.

Laplacian matrix / Symmetric normalized Laplacian

Chebyshev polynomials of the first kind

Check this blog

### Dataset & preprocess:

• Dataset: Autism Brain Imaging Data Exchange (ABIDE)
• Preprocess pipeling: Configurable Pipeline for the Analysis of Connectomes (C-PAC)
  
Including:
* skull striping
* slice timing correction
* motion correction
* global mean intensity normalisation
* nuisance signal regression
* band-pass filtering (0.01-0.1Hz)
* registration of fMRI images to standard anatomical space (MNI152)


• ROI:
• Harvard Oxford (HO) atlas (R = 110 cortical and subcortical ROIs)
• Extract the mean time series for ROI
• Normalised to zero mean and unit variance.
• Number:
  
Subjects number: N = 871
ASD disease: 403
Healthy controls: 468
Sites number: 20
(from different imaging sites, 871 met the imaging quality and phenotypic information criteria)



### Network detail:

• Task: measure the similarity between two graph
• Graph:
• Vertex: Each ROI is represent by a node $\mathcal{v}_i\in\mathcal{V}$
• Input feature: for each ROI, the input feature is the corresponding row of correlation matrix for that ROI.
• Edge & weight:
• In their paper, they claim that they use $e_{ij}=d(v_i,v_j)=\sqrt{\|v_i-v_j\|^2}$ for weight
• In their code, they used $% $ for weight ($\theta, \mathcal{k}$ are some parameters)
• The edge is determined by k-NN (k-nearest neighbors).
• Network Structure:
1. CNN:
1. 2 layers with 64 features (shared in Siamese network)
2. K=3, convolution takes input at most K steps away from a node.
2. FC:
1. One output with Sigmoid activation ${\displaystyle S(x)={\frac {1}{1+e^{-x}}}={\frac {e^{x}}{e^{x}+1}}.}$
2. A binary feature is introduced at the FC layer indicating whether the subject pair were scanned at the same site or not.
3. Dropout 0.2 on FC
• Loss function:

It maximises the mean similarity $\mu^+$ between embeddings belonging to the same class, minimises the mean similarity between embeddings belonging to different classes $\mu^-$. And minimises the variance of pairwise similarities for both matching $\sigma^{2+}$ and non-matching $\sigma^{2-}$ pairs of graphs.

• Network detail:
• Adam optimizer: 0.001 learning rate and 0.005 regularization
• Loss function: margin m=0.6, weight lambda=0.35
• mini-batch: 200
• Train and test:
1. 871 total, 720 train, 151 test.
2. train form 21802 matching and 21398 non-matching graph pairs. test form 5631 matching and 5694 non-matching.
3. all graphs are fed to the network the same number of times to avoid biases.
4. subjects from all 20 sites are included in both training and test sets

### Results:

In order to demonstrate the learned metric’s ability to facilitate a subject classification task (ASD vs control), we use a simple k-nn classifier with k = 3 based the estimated distances. Improvement in classification scores reaches 11.9% on the total test set and up to 40% for individual sites (Compared with PCA/Euclidean distance).

### Personal thought:

• What is the point of Siamese network? Why not just a classifier with GCN and FC (or maybe the performance is not good in this way?)?