CMU CS 10701  Bayes optimal classifier Naïve Bayes What’s learning, revisited (12 pages)
Previewing pages 1, 2, 3, 4 of 12 page document View the full content.Bayes optimal classifier Naïve Bayes What’s learning, revisited
Previewing pages 1, 2, 3, 4 of actual document.
View the full content.View Full Document
Bayes optimal classifier Naïve Bayes What’s learning, revisited
0 0 137 views
 Pages:
 12
 School:
 Carnegie Mellon University
 Course:
 Cs 10701  Introduction to Machine Learning
Introduction to Machine Learning Documents

12 pages

16 pages

17 pages

Predicting Realvalued outputs
51 pages

Bayesian Networks (Structure) Learning
11 pages

40 pages

Bayesian Networks – Representation
46 pages

15 pages

20 pages

Machine Learning, Function Approximation and Version Spaces
33 pages

30 pages

10 pages

EM for HMMs a.k.a. The BaumWelch Algorithm
28 pages

Bayesian Networks: Independencies and Inference
22 pages

15 pages

CoTraining for Semi supervised learning (cont.)
51 pages

58 pages

Markov Decision Processes (MDPs)
51 pages

41 pages

Graphical Models and Bayesian Networks
11 pages

20 pages

16 pages

22 pages

EM for HMMs a.k.a. The BaumWelch Algorithm
35 pages

Logistic Regression, cont. Decision Trees
32 pages

Machine Learning, Decision Trees, Overfitting
31 pages

42 pages

13 pages

45 pages

51 pages

59 pages

20 pages

24 pages

21 pages

25 pages

38 pages

20 pages

26 pages

13 pages

44 pages

33 pages

SemiSupervised Learning and Text Analysis
53 pages

36 pages

5 pages

26 pages

18 pages

18 pages

SVM as a Convex Optimization Problem
10 pages

11 pages

22 pages

51 pages

Bayesian Networks – Inference (continued) Learning
36 pages

14 pages

Bayes Nets Representation: joint distribution and conditional independence
16 pages

24 pages

20 pages

Time series, HMMs, Kalman Filters
30 pages

11 pages

Unsupervised Learning Clustering Kmeans
23 pages

16 pages

Markov Decision Processes (MDPs)
48 pages

20 pages

62 pages

PAClearning, VC Dimension and Margin based Bounds (cont.)
39 pages

An introduction to graphical models
19 pages

20 pages

Simple Model Selection Cross Validation Regularization
15 pages

30 pages

Matrix MLE for Linear Regression
4 pages

39 pages

Feature Selection for fMRI Classification
8 pages

Inference in Bayesian Networks
31 pages

40 pages

14 pages

19 pages

NONPARAMETRIC DENSITY ESTIMATION
23 pages

18 pages

37 pages

SVMs, Duality and the Kernel Trick (cont.)
59 pages

Computational biology: Sequence alignment and profile HMMs
42 pages

Bayesian point estimation Gaussians Linear Regression BiasVariance Tradeoff
20 pages

13 pages

Graphical Models meet Margin based Learning
45 pages

10 pages

26 pages

Unsupervised Learning Clustering Kmeans
22 pages

Dimensionality reduction (cont.)
58 pages

17 pages

Logistic Regression (Continued) Generative v. Discriminative Decision Trees
62 pages

27 pages

50 pages

15 pages

What’s learning, revisited Overfitting Bayes optimal classifier Naïve Bayes
43 pages

48 pages

Markov Decision Processes (MDPs)
26 pages

24 pages

16 pages

40 pages

Bayesian Networks – Representation (cont.) Inference
50 pages

23 pages

20 pages

26 pages

28 pages

8 pages

What’s learning? Point Estimation
34 pages

What’s learning? Point Estimation
22 pages

Bayesian Networks – Representation
31 pages

Simple Model Selection Cross Validation Regularization Neural Networks
53 pages

Computational Learning Theory Part 2
16 pages

34 pages

15 pages

29 pages

SVMs, Duality and the Kernel Trick
61 pages

3 pages

What’s learning? Point Estimation
18 pages

Recitation: SVD and dimensionality reduction
21 pages

58 pages

50 pages

11 pages

Logistic Regression (Continued) Generative v. Discriminative Decision Trees
34 pages

11 pages

7 pages

9 pages

23 pages

Support Vector Machines Kernel Methods
22 pages

Bayesian Networks – (Structure) Learning
85 pages

Markov Decision Processes (MDPs)
48 pages

35 pages

Practical Issues in Learning  Overfitting and Model Selection
23 pages

PAClearning, VC Dimension and Marginbased Bounds
32 pages

Learning a probabalistic model of rainfall using graphical models
8 pages

35 pages

The Optimized Physical Model for Real Rover Vehicle
8 pages

52 pages

7 pages

CoTraining for Semi supervised learning (cont.)
52 pages

Machine Learning, Decision Trees, Overfitting
38 pages

49 pages

22 pages

7 pages

2 pages

14 pages

31 pages

16 pages

18 pages

21 pages

Tree Augmented Naïve Bayesian Classifier with Feature Selection for fMRI Data
8 pages

Time series, HMMs, Kalman Filters
30 pages

EM for HMMs a.k.a. The BaumWelch Algorithm
28 pages

35 pages

Bayes Nets DSeparation & Inference
22 pages

22 pages

37 pages

Bayesian Networks – Representation
31 pages

5 pages

47 pages

EM for HMMs a.k.a. The BaumWelch Algorithm
36 pages

27 pages

24 pages

NONPARAMETRIC CLASSIFICATION AND ERROR ESTIMATION
10 pages

Simple Model Selection Cross Validation Regularization
14 pages

22 pages

Naive Bayes and Logistic Regression
27 pages

Naïve Bayes with Continuous (variables) Logistic Regression
51 pages

Computational Learning Theory and Model Selection
19 pages

10 pages

PAClearning, VC Dimension and Marginbased Bounds
59 pages

Artificial Neural Networks to learn f: X Y
13 pages

17 pages

15 pages

Gaussians Linear Regression BiasVariance Tradeoff
17 pages

32 pages

Bayesian Networks – Inference (continued) Learning
27 pages

Bayesian Networks –(Structure) Learning
85 pages

42 pages

Inductive Programming by Expectation Maximization Algorithm
8 pages

19 pages

20 pages

SVMs, Duality and the Kernel Trick
14 pages

19 pages

Logistic Regression (Continued) Generative v. Discriminative Decision Trees
34 pages

25 pages

Bayesian Networks – Inference (continued) Learning
27 pages

Linear Regression BiasVariance Tradeoff
15 pages

Bayesian Networks – Structure Learning (cont.)
84 pages

Learning from Labeled and Unlabeled Data
54 pages

16 pages

25 pages

32 pages

Bayesian Networks (Structure) Learning
17 pages

27 pages

11 pages

16 pages

56 pages

45 pages

42 pages

boostingxvalidationregularization
47 pages

6 pages

10 pages

50 pages

14 pages

12 pages

9 pages

24 pages

PAClearning, VC Dimension (cont.)
23 pages

15 pages

42 pages

29 pages

15 pages

Expectation Maximization, and Learning from Partly Unobserved Data
45 pages

gaussiansregressionannotated
15 pages

5 pages

51 pages

15 pages

Linear Regression and Artificial Neural Networks
16 pages

3 pages

54 pages

74 pages

Logistic Regression, Generative and Discriminative Classifiers
28 pages

26 pages

37 pages

37 pages

Bayesian Networks – Representation
45 pages

Decision Trees, cont. Boosting
18 pages

8 pages

Unsupervised learning or Clustering (cont.) – Kmeans
63 pages

Simple Model Selection Cross Validation Regularization Neural Networks
20 pages

What’s learning? Point Estimation
18 pages

68 pages

Textual entailment in the domain of physics
7 pages

Inferring Depth from Single Images in Natural Scenes
8 pages

13 pages

Markov Decision Processes (MDPs)
29 pages

12 pages

11 pages

14 pages

35 pages

36 pages

14 pages

Hierarchical Bayesian Models for Text Classification
7 pages

Instancebased Learning (a.k.a. nonparametric methods)
16 pages

What’s learning, revisited Overfitting
26 pages

The Boosting Approach to Machine Learning An Overview
23 pages

24 pages

Bayesian Networks – Structure Learning
84 pages

36 pages

22 pages

28 pages

Feature selection for grasp classification
8 pages

20 pages

37 pages

Dimensionality Reduction_03_31_2011_ann
11 pages

DTrees And Over fitting1112011_final
20 pages

5 pages

13 pages

49 pages

8 pages

13 pages

26 pages
Sign up for free to view:
 This document and 3 million+ documents and flashcards
 High quality study guides, lecture notes, practice exams
 Course Packets handpicked by editors offering a comprehensive review of your courses
 Better Grades Guaranteed
Bayes optimal classifier Na ve Bayes What s learning revisited Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University September 21st 2009 Carlos Guestrin 2005 2009 1 Classification Learn h X a Y features Y target classes X Suppose you know P Y X exactly how should you classify Bayes classifier Why 1 Optimal classification Theorem Bayes classifier hBayes is optimal That is Proof Bayes Rule Which is shorthand for 2 How hard is it to learn the optimal classifier Data How do we represent these How many parameters Prior P Y Likelihood P X Y Suppose Y is composed of k classes Suppose X is composed of n binary features Complex model High variance with limited data Conditional Independence X is conditionally independent of Y given Z if the probability distribution governing X is independent of the value of Y given the value of Z e g Equivalent to 3 What if features are independent Predict Thunder From two conditionally Independent features Lightening Rain The Na ve Bayes assumption Na ve Bayes assumption Features are More independent given class generally How many parameters now Suppose X is composed of n binary features 4 The Na ve Bayes Classifier Given Prior P Y n conditionally independent features X given the class Y For each Xi we have likelihood P Xi Y Decision rule If assumption holds NB is optimal classifier MLE for the parameters of NB Given dataset Count A a B b number of examples where A a and B b MLE for NB simply Prior P Y y Likelihood P Xi xi Yi yi 5 Subtleties of NB classifier 1 Violating the NB assumption Usually features are not conditionally independent Actual probabilities P Y X often biased towards 0 or 1 Nonetheless NB is the single most used classifier out there NB often performs well even when assumption is violated Domingos Pazzani 96 discuss some conditions for good performance Subtleties of NB classifier 2 Insufficient training data What if you never see a training instance where X1 a when Y b e g Y SpamEmail X1 Enlargement P X1 a Y b 0 Thus no matter what the values X2 Xn take P Y b X1 a X2 Xn 0 What now 6 MAP for Beta distribution MAP use most likely parameter Beta prior equivalent to extra thumbtack flips As N 1 prior is forgotten But for small sample size prior is important Bayesian learning for NB parameters a k a smoothing Dataset of N examples Prior MAP estimate distribution Q Xi Y Q Y m virtual examples P Xi Y Now even if you never observe a feature class posterior probability never zero 7 Text classification Classify e mails Y Classify news articles Y what is the topic of the article Classify webpages Y Spam NotSpam Student professor project What about the features X The text Features X are entire document Xi for ith word in article 8 NB for Text classification P X Y is huge Article at least 1000 words X X1 X1000 Xi represents ith word in document i e the domain of Xi is entire vocabulary e g Webster Dictionary or more 10 000 words etc NB assumption helps a lot P Xi xi Y y is just the probability of observing word xi in a document on topic y Bag of words model Typical additional assumption Position in document doesn t matter P Xi xi Y y P Xk xi Y y Bag of words model order of words on the page ignored Sounds really silly but often works very well When the lecture is over remember to wake up the person sitting next to you in the lecture room 9 Bag of words model Typical additional assumption Position in document doesn t matter P Xi xi Y y P Xk xi Y y Bag of words model order of words on the page ignored Sounds really silly but often works very well in is lecture lecture next over person remember room sitting the the the to to up wake when you Bag of Words Approach aardvark 0 about 2 all 2 Africa 1 apple 0 anxious 0 gas 1 oil 1 Zaire 0 10 NB with Bag of Words for text classification Learning phase Prior P Y Count how many documents you have from each topic prior P Xi Y For each topic count how many times you saw word in documents of this topic prior Test phase For each document Use na ve Bayes decision rule Twenty News Groups results 11 Learning curve for Twenty News Groups 12
View Full Document