MIT 6 047 - Lecture Notes - D970762

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 047> Lecture Notes

MIT 6 047 - Lecture Notes

School name Massachusetts Institute of Technology

Course 6 047- Computational Biology

Pages 44

Download Save

Unformatted text preview:

ClassificationTwo Different ApproachesBayesian ClassificationClassifying Mitochondrial ProteinsLets Look at Just One FeatureThe First Key ConceptThe Second Key ConceptBut How Do We Classify?Bayes RuleBayes Decision RuleDiscriminant FunctionStepping backTwo Fundamental TasksThe All Important Training SetGetting P(X|Class) from Training SetGetting PriorsWe Are Just About There….But What About Multiple Features?Distributions Over Many FeaturesNaïve Bayes ClassifierNaïve Bayes Discriminant FunctionIndividual Feature DistributionsClassifying A New ProteinMaestro ResultsHow Good is the Classifier? Binary Classification ErrorsMaestro Outperforms Existing ClassifiersSupport Vector MachinesSupport Vector Machines (SVMs)Support Vector Machines (SVMs)SVM FormulationAn Optimization ProblemUsing an SVMNon-linear ClassifierKernel MappingKernel MappingKernelsExample KernelsUsing (Non-Linear) SVMsClassifying Tumors with Array DataWeighted Voting ClassficationResultsBringing Clustering and Classification TogetherMIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, EvolutionFall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.ClassificationLecture 5September 18, 2008Computational Biology: Genomes, Networks, EvolutionTwo Different Approaches• Generative– Bayesian Classification and Naïve Bayes– Example: Mitochondrial Protein Prediction• Discriminative– Support Vector Machines – Example: Tumor ClassificationBayesian ClassificationWe will pose the classification problem in probabilistic termsCreate models for how features are distributed for objects of different classesWe will use probability calculus to make classification decisionsClassifying Mitochondrial ProteinsDerive 7 features for all human proteinsPredict nuclear encoded mitochondrial genesMaestroTargeting signal Protein domainsMass SpecCo-expressionHomologyInductionMotifsFirst page of article removed due to copyright restrictions:Calvo, S., et al. "Systematic Identification of HumanMitochondrial Disease Genes Through Integrative Genomics."Nature Genetics 38 (2006): 576-582.Lets Look at Just One Feature• Each object can be associated with multiple features• We will look at the case of just one feature for nowWe are going to define two key Co-ExpressionConservationProteinsThe First Key ConceptFeatures for each class drawn from class-conditional probability distributions (CCPD)P(X|Class1)P(X|Class2)Our first goal will be to model these distributionsXThe Second Key ConceptWe model prior probabilities to quantify the expected a priori chance of seeing a classP(mito) = how likely is the next protein to be a mitochondrial protein before I see any features to help me decideWe expect ~1500 mitochondrial genes out of ~21000 total, soP(mito)=1500/21000P(~mito)=19500/21000P(Class2) & P(Class1)But How Do We Classify?• So we have priors defining the a priori probabilityof a class• We also have models for the probability of a feature given each classBut we want the probability of the class given a featureHow do we get P(Class1|X)?P(Class1), P(Class2)P(X|Class1), P(X|Class2)Bayes Rule(|)( )(| )()PFeature Class P ClassPClass FeaturePFeature=Belief before evidenceEvaluateevidenceEvidenceBelief after evidenceBayes, Thomas (1763) An essaytowards solving a problem in thedoctrine of chances. PhilosophicalTransactions of the Royal Society ofLondon, 53:370-418Bayes Decision RuleIf we observe an object with feature X, how do decide if the object is from Class 1? The Bayes Decision Rule is simply choose Class1 if:( 1|) ( 2|)(| 1)(1) (| 2)(2)((| 1)( 1) (| 2)()))(2PClass X P Class XPX Class PL PX ClassP X Class P Class P XPLPX PXClass P Class>>>This is the same number on both sides!Discriminant FunctionWe can create a convenient representation of the Bayes Decision RuleIf G(X) > 0, we classify as Class 1(|1) ( 1) (|2) ( 2)(| 1)( 1)1(| 2)( 2)(| 1)( 1)()log 0(| 2)( 2)P X Class P Class P X Class P ClassP X Class P ClassPX Class PClassP X Class P ClassGXP X Class P Class>>=>Stepping backWhat do we have so far?P(X|Class1), P(X|Class2) P(Class1), P(Class2)(| )( )11()log 0(| )( )22ClaClass Classss ClasPX PGXP sXP=>Given a new feature, X, we plug it into this equation……and if G(X)> 0 we classify as Class1We have defined the two components, class-conditional distributions and priorsWe have used Bayes Rule to create a discriminant function for classification from these componentsTwo Fundamental Tasks• We need to estimate the needed probability distributions– P(X|Mito) and P(x|~Mito)– P(Mito) and P(~Mito)• We need to assess the accuracy of the classifier– How well does it classify new objectsThe All Important Training SetHow many data points you need depends on the problemNeed to build and test your classifierBuilding a classifier requires a set of labeled data points called the Training SetThe quality of the classifier depends on the number of training set data pointsGetting P(X|Class) from Training SetIn general, and especially for continuous distributions,this can be a complicated problemDensity EstimationP(X|Class1)How do we get this from these?One Simple ApproachDivide X values into binsAnd then we simply count frequencies<1 1-3 3-5 5-7 >72/1307/133/131/13There are 13 data pointsXGetting Priors1. Estimate priors by counting fraction of classes in training set2. Estimate from “expert”knowledge3. We have no idea – use equal (uninformative) priorsThree general approachesP(Class1)=13/23P(Class2)=10/2313 Class1 10 Class2But sometimes fractions in training set are not representative of worldExampleP(mito)=1500/21000P(~mito)=19500/21000P(Class1)=P(Class2)We Are Just About There….We have created the class-conditional distributions and priorsP(X|Class1), P(X|Class2) P(Class1), P(Class2)But there is one more little complication…..(| )( )11()log 0(| )( )22ClaClass Classss ClasPX PGXP sXP=>And we are ready to plug these into our discriminant functionBut What About Multiple Features?• We have focused on a single feature for an object• But mitochondrial protein prediction (for example) has 7 featuresSo P(X|Class) become P(X1,X2,X3,…,X8|Class)and our discriminant function becomesTargeting signal Protein domainsMass SpecCo-expressionHomologyInductionMotifs12 712 7( , ,..., | )()()log 0(, ,. 2.., | ) ( )112ClassClassClasPX X XClassPGXPX X sXP=>Distributions Over Many Features• Assume each feature binned into 5 possible values •

View Full Document


School:
Email:
New Password:
Confirm Password:

MIT 6 047 - Lecture Notes

Sign up for free to view:

Please select your school