1 Introduction2 Related Work3 Methods3.1 Feature Selection3.1.1 Activity Based3.1.2 Principal Component Analysis (PCA)3.1.3 Active Voxel Time Averaging (AVTA)3.2 Classifiers and Training3.2.1 Tree Augmented Naïve Bayes (TAN)3.2.2 Naïve Bayes (NB)3.3 Classification3.3.1 NB classifier3.3.2 TAN classifier4 Experiments and Results4.1 Activity based feature selection4.2 Principal Component Analysis4.3 Active Voxel Time Averaging5 Conclusion6 AcknowledgementsReferencesTree Augmented Naïve BayesianClassifier with Feature Selection forfMRI DataAabid Shariff Ahmet [email protected] [email protected] Magnetic Resonance Imaging of brain produces a vastamount of data that could help in understanding cognitiveprocesses. In order to achieve this, the problem is cast as aclassification problem. Here, we implement Tree AugmentedNaïve Bayes to increase the accuracy of previously implementedNaïve Bayes. We also use activity based feature selection andPrincipal Component Analysis to reduce the dimensionality of dataand increase accuracy. We have shown that TAN classifier performsbetter than NB classifier. Also, we have modified the activity basedfeature selection method and we have shown that there is asignificant improvement in the classification accuracy.1 Introd u ctionFunctional Magnetic Resonance Imaging (fMRI) is a powerful technique that isknown to represent neural activity in brain indirectly. Figure 1 illustrates fMRI datawith an instantaneous image of a slice of the brain and change in activity in avolume of the brain. Although this data does not provide us single neuron resolutionof neural activity, many studies have reported the use of this data to identifycognition. These kind of studies are important in the understanding of cognitiveprocesses, medical diagnostics (e.g. in Alzheimer’s disease), etc. The basis of thesestudies is the existence of anatomically distinct regions in the brain for distinctfunctions carried out by the brain that reflect a particular cognitive process. We cannow use classification methods to understand the mapping of brain activity tocognition. fMRI data has been very useful to implement as an input to classificationalgorithms. Some problems with fMRI data are that they are high dimensional, noisyand sparse. This project aims at implementing Tree Augmented Naïve BayesClassifier to increase the accuracy of prediction compared to naïve Bayes classifierwhile addressing above issues associated with data.Figure 1: fMRI data from subject 05710: (a) Image of a slice of brain and (b) timechange of activity in a voxel.2 Rel a ted Wor kRecent work has trained several methods to classify the cognitive states of brain of ahuman subject by using fMRI data (Mitchell, 2004). The tasks defined in the studywere classification of states looking at a sentence versus looking at a picture,reading an ambiguous sentence versus reading an non-ambiguous sentence, andviewing a word describing one of several categories,. The study has made use ofNaïve Bayes classifier (NB), Support Vector Machines (SVM), and k-NearestNeighbor (kNN) machine learning methods. The group has also applied fourdifferent types of feature selection methods in accordance with the nature of thedata. These methods were based on the cognitive state discriminative ability ofvoxels, activity of voxels at given cognitive state, activity of voxels categorized byRegions of Interest in the brain, and mean of active voxels for each region ofinterest. In the first type of classification problems, discriminating between lookingat a picture or a sentence, highest prediction accuracy at 89% was achieved by SVMwhen feature selection was performed.Figure 2: Structure of the classifiers used in this study. C is the class variable, A’sare features, and arrows indicate dependence among variables.In the above study, NB classifier performed with 82% accuracy when featureselection was performed. This simple classifier makes independence assumptionsamong all features of the data given the class variable. So, as one would expect thatforsaking some of the irrelevant independence assumptions between some of thefeatures may improve the accuracy of the learner. One method aiming at reducingthe number of such unwarranted independent assumptions is Tree Augmented NaïveBayes (TAN) classifier described by Friedman et al. The structure and relationsbetween class variable and features in NB and TAN models are shown in Figure 2.The procedure described in Friedman et al.’s work is based on Chow and Liu’smethod to find dependence relations among variables so to be able to factorize ajoint probability distribution among these variables. The authors performexperiments on 25 different cases and show that the method provides higheraccuracy than Naïve Bayes classifier in two out of three cases.Construct-TAN procedure, described by Friedman et al. and others, to learn a TANclassifier has time and space complexity of the order O(n2N), where n is the numberof features and N is number of examples. Later, Meila and Shi et al. have modifiedthe algorithm independently to decrease computational cost based on someassumptions and requirements in the data. Meila’s improvement accelerated thealgorithm reducing the time complexity to O[s2Nlog(s2N/n)], where s is a constantrelated to sparsity of data and s << n. So, as it becomes obvious, the acceleration inthe algorithm takes advantage of sparsity in feature vectors of examples which canbe illustrated with well known text classification problem. The more recent study ofShi et al. reduced the space complexity of the algorithm based on Meila’s work. Inall these studies, use of TAN is described for discrete data. Yang et al. has studieddiscretization methods for naïve Bayes classifier. This is still a field of currentresearch, since performance of methods vary with the type of
View Full Document