Feature Selection for fMRI Classification Chuang Wu Program of Computational Biology Carnegie Mellon University Pittsburgh PA 15213 chuangw andrew cmu edu Abstract The functional Magnetic Resonance Imaging fMRI has provided us with an approach of revealing the activity of brain Due to the large amount of data in fMRI studies feature selection techniques are used to select particular features for classifier In this project Spectral Clustering is implemented to construct features to achieve best reconstruction of the data and be most efficient for making predictions 1 In t r od u ction 1 1 Motivation Over the past decade a variety of different functional Magnetic Resonance Imaging fMRI experiments have been done in order to understand the human brain activity pattern when doing some certain task By recording the activity pattern of human brain as images of 3D voxel we are able to visualize the picture of the pattern find statistical differences in bran activity during different tasks and a more challenge problem is to train the data with a classifier so as to predict brain activity given any of the pattern such as whether the human subject is reading a sentence or looking at a picture or whether the subject is reading an ambiguous or non ambiguous sentence etc Machine Learning is a most powerful approach to train the classifier and then use the classifier to discriminate between different cognitive states A typical fMRI experiment produces a three dimensional image related to the human subject s brain activity every half second The experiment consists of a set of trials and the data is partitioned into trails reading a sentence observing a picture and determining whether the sentence correctly described the picture There are about 6 human subjects in the data each of the 40 trials lasts approximately 30 seconds Only a fraction of the brain of each subject was imaged The data is marked up with 25 30 anatomically defined regions called Regions of Interest Each image contains approximately 5 000 voxels 3D pixels across a portion of the brain Learning this series of brain data to an experimental condition label requires many challenges one of which is the extremely sparse noisy data with extremely high dimensional features This would cause the over fitting problem for the classifier Hence it is necessary to apply some feature selection method to make learning tractable and prevent over fitting due to spurious correlations The objective of feature selection is three fold improving the prediction performance of the predictors providing faster and more cost effective predictors and providing a better understanding of the underlying process that generated the data There are a number of generic feature construction methods including clustering basic linear transforms of the input variables PCA SVD LDA more sophisticated linear transforms like spectral transforms Fourier Hadamard wavelet transforms or convolutions of kernels and applying simple functions to subsets of variables like products to create monomials 1 2 Goal of this work The goal of this work is to derive features with Spectral Clustering from the original brain image data as the input for the classifiers to achieve best reconstruction of the data and be most efficient for making predictions Clustering has long been used for feature construction The idea is to replace a group of similar variables by a cluster centroid which becomes a new derived feature The new derived feature then is treated as a representative for the whole cluster as the new input for the classifier The clustering will greatly reduce the number of features and meanwhile without losing much information The most popular algorithms include K means and hierarchical clustering The clustering method used in this project is Spectral Clustering Spectral methods recently emerge as effective methods for data clustering image segmentation Web ranking analysis and dimension reduction The spectral clustering algorithm is based on the concept of similarity between points instead of distance as other algorithms do The implemented algorithm is formulated as graph partition problem where the weight of each edge is the similarity between points that correspond to vertex connected by the edge The goal of the algorithm is find the minimum weight cuts in the graph but this problem can be addressed by the means of linear algebra in particular by the eigenvalue decomposition techniques from which the term spectral derives 2 2 1 Method Intuition Spectral cluster could tell the intrinsic features of the data revealing the underlying cluster One of the biggest differences between Spectral clustering and K mean clustering is that K mean requires the initial value of K i e the number of clusters The initial value of K is kind of arbitrary in contrast Spectral clustering has a rich structure with interesting properties and deep connections to principal component analysis Hence the goal of this work is to implement a spectral clustering method to cluster the image data in order to reduce the feature number construct new features and improve accuracy of classifier The code for manipulation and visualization of the fMRI data has been provided as well as Na ve Bayes Classifier and Logistic Regression Classifier Hence the work needed to be done in this project is to implement the Spectral Clustering algorithm and combine with the existing code to increase the prediction accuracy 2 2 Description of the algorithms The algorithm starts with well motivated objective functions optimization eventually leads to eigenvectors with many clear and interesting algebraic properties At the core of spectral clustering is the Laplacian of the graph adjacency pairwise similarity matrix evolved from spectral graph partitioning The detailed steps for the Spectral Clustering in this project are 2 1 Constructing a matrix M in which rows are corresponding to image and the columns are corresponding to voxels 2 Normalize over the column if the affinity is defined by Euclidean distance 3 Construct Affinity Matrix A The affinity is defined as Aij exp C i j 2 2 2 if i j and Aii 0 where C i j is the correlation between the two vectors of voxels 4 Define D to be the diagonal matrix whose i i element is the sum of A s i th row and construct the matrix L D 1 2AD 1 2 5 Find x1 x2 xk the k largest eigenvectors of L chosen to be orthogonal to each other in the case of repeated eigenvalues and form the matrix X x1 x2 xk by
View Full Document