13 Aug 2017
Graph
Edge connects vertices and . It has weight of
Laplacian is a real symmetric matrix.
It has orthonormal eigenvector as and eigenvalue .
Assume , we denote the entire spectrum by
Eigenfunction of a linear operator is any non-zero function that , where is a scaling factor called eigenvalue.
In one dimensional space, laplacian or laplace operator is :
For classical Fourier transform:
where is eigenfunction of , since,
For graph, we can define graph Fourier transform as,
and inverse graph Fourier transform as,
Graph Spectral Filtering
In classical signal processing, filter is:
where is transfer function of this filter. By inverse Fourier transform,
For graph, we define graph spectral (frequency) filtering as,
by inverse graph Fourier transform,
With some matrix manipulation and orthonormality, we can get:
Based on equation (20), we also define (22) or (23) as convolution on graph.
Chebyshev polynomial expansion
Note: we are using normalized graph laplacian now.
Assume we have a filter , parameterized by
In order to calculate this, we need calculate
To avoid this, we can treat as a function of , and approximate it by truncated expansion in terms of Chebyshev polynomial up to order:
with rescaled , as vector of Chebyshev coefficients, and:
since
Here we have:
Noted this expression is K-localized since it is a order polynomial of laplacian.
03 Aug 2017
Paper: arxiv
Key idea:
Here we give clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly.
Some idea:
The authors argue that residual connections are inherently necessary for training very deep convolutional models. Our findings do not seem to support this view, at least for image recognition. In the experimental section we demonstrate that it is not very difficult to train competitive very deep networks without utilizing residual connections. However the use of residual connections seems to improve the training speed greatly, which is alone a great argument for their use.
01 Aug 2017
Paper: arxiv
Code: github (tensorflow)
Key idea:
Graph Convolutional Networks (GCN) for brain analysis in populations, combining imaging and non-imaging data.
Network Outline:
Task: to assign to each acquisition, corresponding to a subject and time point, a label l ∈ L describing the corresponding subject’s disease state (e.g. control or diseased).
Vertex: We represent the population as a graph where each subject is associated with an imaging feature vector and corresponds to a graph vertex.
Edge: The graph edge weights are derived from phenotypic data, and encode the pairwise similarity between subjects and the local neighbourhood system.
population graph’s adjacency matrix W is defined as follows:
where is similarity between subjects based on image measures. is a measure of distance between phenotypic measures (non-imaging measures). Here is a set of H non-imaging measures (e.g. subject’s gender and age.
GCN: check this blog and this paper Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Training: This structure is used to train a GCN model on partially labelled graphs, aiming to infer the classes of unlabelled nodes from the node features and pairwise associations between subjects.
Network detail:
- ReLU activation after graph convolutional layer ()
- Softmax activation in final layer
- Loss: cross-entropy
- Unlabelled nodes are then assigned the labels maximising the softmax output.
- Dropout
- l2 regularisation
Dataset Detail:
Autism Brain Imaging Data Exchange (ABIDE)
- Task: classify subjects healthy or suffering from Autism Spectrum Disorders (ASD).
- Objective: exploit the acquisition information which can strongly affect the comparability of subjects.
- Dataset: ABIDE
- Dataset Detail:
- 871 subjects, 403 ASD and 468 healthy controls.
- 20 different sites
- Preprocessing pipeline from C-PAC & ROI from Harvard Oxford (HO) atlas, same as Ruckert2016
- The individual connectivity matrices are estimated by computing the Fisher transformed Pearson’s correlation coefficient between the representative rs-fMRI timeseries of each ROI in the HO atlas.
- Input feature (vertex): vectorised functional connectivity matrix. And a ridge classifier is employed to select the most discriminative features from the training set.
- Adjacency matrix (edge and weight):
- is the correlation distance between the subjects’ rs-fMRI connectivity networks after feature selection.
- non-imaging measures: subject’s gender and acquisition site
Alzheimer’s Disease Neuroimaging Initiative (ADNI)
- Task: predict whether an MCI patient will convert to AD.
- Objective: demonstrate the importance of exploiting longitudinal information, which can be easily integrated into our graph structure, to increase performance.
- Dataset: ADNI
- Dataset Detail:
- 540 subjects (1675 samples) with early/late MCI and contained longitudinal T1 MR images, 289 subjects (843 samples) diagnosed as AD
- Acquisitions after conversion to AD were not included.
- Input feature (vertex): volumes of all 138 segmented brain structures
- Adjacency matrix (edge and weight):
- :
- non-imaging measures: subject’s gender and age information
Results
- 10-fold stratified cross validation strategy used.
- K = 3 order Chebyshev polynomials.
- In ADNI, longitudinal acquisitions of the same subject are in the same fold.
Autism Brain Imaging Data Exchange (ABIDE)
- Result: We show how integrating acquisition information allows to outperform the current state of the art on the whole dataset with a global accuracy of 69.5%.
Alzheimer’s Disease Neuroimaging Initiative (ADNI)
- Result: an average accuracy of 77% on par with state of the art results, corresponding to a 10% increase over a standard linear classifier.
30 Jul 2017
Paper: arxiv
Code: github
Submitted: 30 Jun 2016
Key Idea:
We present a formulation of CNNs in the context of spectral graph theory, … design fast localized convolutional filters on graphs.
Claimed contribution:
- Spectral formulation: tools from graph signal processing (GSP) on CNN
- Strictly localized filters: filters strictly localized in a ball of radius K
- Low computational complexity: linear complexity w.r.t the input data size n. no Fourier basis, eigenvalue decomposition and store the basis. Only need o store the Laplacian, a sparse matrix of non-zero values.
- Efficient pooling: is analog to pooling of 1D signals by rearrangement of the vertices as a binary tree structure.
- Experimental results
Background Knowledge:
- Hilbert Space
A Hilbert space H is a real or complex inner product space that is also a complete metric space with respect to the distance function induced by the inner product.
wiki Inner product of functions f and g in is
- Square-integrable function
In mathematics, a square-integrable function, also called a quadratically integrable function, is a real- or complex-valued measurable function for which the integral of the square of the absolute value is finite.
A space which is complete under the metric induced by a norm is a Banach space. Therefore, the space of square integrable functions is a Banach space, under the metric induced by the norm, which in turn is induced by the inner product. As we have the additional property of the inner product, this is specifically a Hilbert space, because the space is complete under the metric induced by the inner product.
- some detailed Chebyshev calculation check section III part C “The Chebyshev Polynomial Approximation”
Methodology
Fast localized spectral filters
Graph coarsening
Use coarsening phase of the Graclus multilevel clustering algorithm with normalized cut as spectral clustering objectives. Graclus’ greedy rule consists, at each coarsening level, in picking an unmarked vertex i and matching it with one of its unmarked neighbors j that maximizes the local normalized cut . The two matched vertices are then marked and the coarsened weights are set as the sum of their weights.
Fast Pooling of Graph Signals
After coarsening, the nodes in pair would be pooled (in article they use maxpooling), the node not in pair (called singleton) is paired with a fake nodes (initialed with neutral value, e.g. 0, when using ReLU and max pooling). This is pooling of 2, pooling of 4 can be done by 2-pooling 2 times.
Result and discussion
MNIST
Use the image 2D grid for 8-NN (K-nearest neighbourhood) graph, almost same performance as normal CNN on image data (CNN vs GCNN: 99.33% vs 99.14%).
They say this may due to isotropic nature of the spectral filters, i.e. the fact that edges in a general graph do not possess an orientation (like up, down, right and left for pixels on a 2D grid).
20NEWS
Use bag-of-words and other embedding get good result.
findings
Their Spectral Filter is linear O(n) complexity compared to other filter and easily be paralleled by GPU which get 8 times speedup.
better Graph Quality, better result