Computational Biology, Part 7 Supervised Machine Learning and Searching for Sequence FamiliesSlide 2What is Machine Learning?Fundamental Question of Machine LearningWhy Machine Learning?Slide 6Successful Machine Learning ApplicationsMachine Learning ParadigmsSupervised LearningClassification vs. RegressionRepresentationFormal descriptionInductive learning hypothesisHypothesis spaceSlide 15k-Nearest Neighbor (kNN)Slide 17Slide 18Linear DiscriminantsDecision treesSlide 21Slide 22Slide 23Support vector machinesSupport Vector Machines (SVMs)Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Cross-ValidationDescribing classifier errorsConfusion matrix - binaryPrecision-recall analysisSlide 37Confusion matrix – multi-classGround truthStating Goals vs. ApproachesSlide 41ResourcesSlide 43Goals for sequence familiesPossible ApproachesPSSMsLearning PSSMsPosition Specific Iterated BLAST (PSI-BLAST)Problems with PSSMsCobblingSlide 51Slide 52Cobbler IllustrationFamily Pairwise SearchSlide 55Which method is best?Comparison ProtocolEvaluationEvaluation metric - ROCExample of Evaluation for ROC2Protocol for Comparison of MethodsResultsConclusionComparison ProtocolWhich is best (part 2)?Slide 66Slide 67ConclusionsComputational Biology, Part 7Supervised Machine Learning and Searching for Sequence FamiliesComputational Biology, Part 7Supervised Machine Learning and Searching for Sequence FamiliesRobert F. MurphyRobert F. MurphyCopyright Copyright 2008-2009. 2008-2009.All rights reserved.All rights reserved.www.cs.cmu.edu/~tom/pubs/MachineLearning.pdfWhat is Machine Learning?What is Machine Learning?Fundamental Question of Computer Fundamental Question of Computer Science: How can we build machines that Science: How can we build machines that solve problems, and which problems are solve problems, and which problems are inherently tractable/intractable?inherently tractable/intractable?Fundamental Question of Statistics: What Fundamental Question of Statistics: What can be inferred from data plus a set of can be inferred from data plus a set of modeling assumptions, with what modeling assumptions, with what reliability?reliability?Tom Mitchell white paperFundamental Question of Machine LearningFundamental Question of Machine LearningHow can we build computer systems that How can we build computer systems that automatically improve with experience, and automatically improve with experience, and what are the fundamental laws that govern what are the fundamental laws that govern all learning processes?all learning processes?Tom MitchellTom MitchellTom Mitchell white paperWhy Machine Learning?Why Machine Learning?Learn relationships from large sets of complex Learn relationships from large sets of complex data: Data miningdata: Data miningPredict clinical outcome from testsPredict clinical outcome from testsDecide whether someone is a good credit riskDecide whether someone is a good credit riskDo tasks too complex to program by handDo tasks too complex to program by handAutonomous drivingAutonomous drivingCustomize programs to user needsCustomize programs to user needsRecommend book/movie based on previous likesRecommend book/movie based on previous likesTom Mitchell white paperWhy Machine Learning?Why Machine Learning?Economically efficientEconomically efficientCan consider larger data spaces and Can consider larger data spaces and hypothesis spaces than people canhypothesis spaces than people canCan formalize learning problem to explicitly Can formalize learning problem to explicitly identify/describe goals and criteriaidentify/describe goals and criteriaSuccessful Machine Learning ApplicationsSuccessful Machine Learning ApplicationsSpeech recognitionSpeech recognitionTelephone menu navigationTelephone menu navigationComputer visionComputer visionMail sortingMail sortingBio-surveillanceBio-surveillanceIdentifying disease outbreaksIdentifying disease outbreaksRobot controlRobot controlAutonomous drivingAutonomous drivingEmpirical scienceEmpirical scienceTom Mitchell white paperMachine Learning ParadigmsMachine Learning ParadigmsSupervised LearningSupervised LearningClassificationClassificationRegressionRegressionUnsupervised LearningUnsupervised LearningClusteringClusteringSemi-supervised LearningSemi-supervised LearningCotrainingCotrainingActive learningActive learningSupervised LearningSupervised LearningApproachesApproachesClassification (discrete predictions)Classification (discrete predictions)Regression (continuous predictions)Regression (continuous predictions)Common considerationsCommon considerationsRepresentation (Features)Representation (Features)Feature SelectionFeature SelectionFunctional formFunctional formEvaluation of predictive powerEvaluation of predictive powerClassification vs. RegressionClassification vs. RegressionIf I want to predict whether a patient will If I want to predict whether a patient will die from a disease within six months, that is die from a disease within six months, that is classificationclassificationIf I want to predict how long the patient will If I want to predict how long the patient will live, that is regressionlive, that is regressionRepresentationRepresentationDefinition of thing or things to be predictedDefinition of thing or things to be predictedClassification: Classification: classesclassesRegression: Regression: regression variableregression variableDefinition of things (Definition of things (instancesinstances) to make ) to make predictions forpredictions forIndividualsIndividualsFamiliesFamiliesNeighborhoods, etc.Neighborhoods, etc.Choice of descriptors (Choice of descriptors (featuresfeatures) to describe ) to describe different aspects of instancesdifferent aspects of instancesFormal descriptionFormal descriptionDefining Defining XX as a set of as a set of instancesinstances x x described described by by featuresfeaturesGiven training examples Given training examples D D from from XXGiven a Given a target function ctarget function c that maps that maps X-X->{0,1}>{0,1}Given a Given a hypothesis space Hhypothesis space HDetermine an hypothesis Determine an hypothesis hh in in HH such that such that h(x)h(x)==c(x) c(x) for all for all xx in in DDCourtesy Tom MitchellInductive learning hypothesisInductive learning hypothesisAny hypothesis found to approximate the Any
View Full Document