Supervised Machine Learning: Classification TechniquesSupervised Machine LearningDecision TreesDecision Trees, an exampleDecision Trees: AssessmentBayesian NetworksBayesian Networks, an exampleBayesian Networks: AssessmentNeural NetworksNeural Networks: Biological BasisFeed-forward Neural NetworkNeural Networks: TrainingBackpropagationSupport Vector MachinesPerceptron Revisited:Which one is the best?Notion of MarginMaximizing MarginKernel TrickNon-linear SVMs: Feature spacesExamples of Kernel TrickSlide 22Advantages and Applications of SVMThe Future of Supervised Learning (1)The Future of Supervised Learning (2)ReferencesSupervised Machine Learning:Classification TechniquesChaleece SandbergChris BradleyKyle WalshSupervised Machine LearningSML: machine performs function (e.g., classification) after training on a data set where inputs and desired outputs are providedFollowing training, SML algorithm is able to generalize to new, unseen dataApplication: “Data Mining” Often, large amounts of data must be handled efficientlyLook for relevant information, patterns in dataDecision TreesLogic-based algorithmSort instances (data) according to feature values…a hierarchy of testsNodes: featuresRoot node: feature that best divides dataAlgorithms exist for determining the best root nodeBranches: values the node can assumeDecision Trees, an exampleINPUT: data (symptom)low RBC countyesnosize of cellslarge smallbleeding?STOPB12 deficient?yesyesstop bleedingnoSTOP gastrin assaypositivenegativeSTOP anemiaOUTPUT: category (condition)Decision Trees: AssessmentAdvantages:Classification of data based on limiting features is intuitiveHandles discrete/categorical features bestLimitations:Danger of “overfitting” the dataNot the best choice for accuracyBayesian NetworksGraphical algorithm that encodes the joint probability distribution of a data setCaptures probabilistic relationships between variablesBased on probability that instances (data) belong in each categoryBayesian Networks, an exampleWikipedia, 2008Bayesian Networks: AssessmentAdvantages:Takes into account prior information regarding relationships among featuresProbabilities can be updated based on outcomesFast!…with respect to learning classificationCan handle incomplete sets of dataAvoids “overfitting” of dataLimitations:Not suitable for data sets with many featuresNot the best choice for accuracyNeural NetworksUsed for:ClassificationNoise reductionPredictionGreat because:Able to learnAble to generalizeKiranPlaut’s (1996) semantic neural network that could be lesioned and retrained – useful for predicting treatment outcomesMikkulainenEvolving neural network that could adapt to the gaming environment – useful learning applicationNeural Networks: Biological BasisFeed-forward Neural NetworkPerceptron:Hidden layerNeural Networks: TrainingPresenting the network with sample data and modifying the weights to better approximate the desired function.Supervised LearningSupply network with inputs and desired outputsInitially, the weights are randomly setWeights modified to reduce difference between actual and desired outputsBackpropagationBackpropagationSupport Vector MachinesPerceptron Revisited:Linear Classifier: y(x) = sign(w.x + b)w.x + b = 0w.x + b < 0w.x + b > 0Which one is the best?Notion of MarginDistance from a data point to the hyperplane:Data points closest to the boundary are called support vectors Margin d is the distance between two classes.wxw || brrdMaximizing MarginMaximizing margin is a quadratic optimization problem.Quadratic optimization problems are a well-known class of mathematical programming problems, and many (rather intricate) algorithms exist for solving them.Kernel TrickWhat if the dataset is non-linearly separable? We use a kernel to map the data to a higher-dimensional space:0x2x0xNon-linear SVMs: Feature spacesGeneral idea: The original space can always be mapped to some higher-dimensional feature space where the training set becomes separable:Φ: x → φ(x)Examples of Kernel TrickFor the example in the previous figure: The non-linear mappingA more commonly used radial basis function (RBF) kernel),()(2xxxx 222/||||),(jieKjixxxxAdvantages and Applications of SVMAdvantages of SVM Unlike neural networks, the class boundaries don’t change as the weights change.Generalizability is high because margin is maximized.No local minima and robustness to outliers.Applications of SVM Used in almost every conceivable situation where automatic classification of data is needed.(example from class) Raymond Mooney and his KRISPER natural language parser.The Future of Supervised Learning (1)Generation of synthetic data A major problem with supervised learning is the necessity of having large amounts of training data to obtain a good result.Why not create synthetic training data from real, labeled data?Example: use a 3D model to generate multiple 2D images of some object (such as a face) under different conditions (such as lighting). Labeling only needs to be done for the 3D model, not for every 2D model.The Future of Supervised Learning (2)Future applications Personal software assistants learning from past usage the evolving interests of their users in order to highlight relevant news (e.g., filtering scientific journals for articles of interest)Houses learning from experience to optimize energy costs based on the particular usage patterns of their occupantsAnalysis of medical records to assess which treatments are more effective for new diseasesEnable robots to better interact with humansReferenceshttp://homepage.psy.utexas.edu/homepage/class/Psy394U/Hayhoe/cognitive%20science%202008/talks:readings/ http://www.ai-junkie.com/ann/evolved/nnt1.html http://galaxy.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.htmlhttp://cbcl.mit.edu/cbcl/people/heisele/huang-blanz-heisele.pdf
View Full Document