DOC PREVIEW
CMU CS 10601 - Lecture

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Classification10-601 Machine LearningWhere we areInputsClassifierPredictcategoryInputsDensityEstimatorProb-abilityInputsRegressorPredictreal no.√TodayLaterClassification• Assume we want to teach a computer to distinguish between cats and dogs …Bayes decision rule• If we know the conditional probability p(x | y) and class priors p(y) we can determine the appropriate class by using Bayes rule:• We can use qi(x) to select the appropriate class.• We chose class 0 if q0(x)  q1(x) and class 1 otherwise• This is termed the ‘Bayes decision rule’ and leads to optimal classification.• However, it is often very hard to compute …P(y  i | x) P(x | y  i)P(y  i )P(x)defqi(x)Note that p(x) does not affect our decisionMinimizes our probability of making a mistakeBayes decision rule• We can also use the resulting probabilities to determine our confidence in the class assignment by looking at the likelihood ratio:P(y  i | x) P(x | y  i)P(y  i )P(x)defqi(x)L(x) q0(x)q1(x)Also known as likelihood ratio test, we will talk more about this laterBayes decision rule: ExamplexxNormal Gaussiansx1x21122xxNormal Gaussiansx1x21122Bayes errorXP(Y|X)P1(X)P(Y=1)P0(X)P(Y=0)p(x | y =1)x values for which we will have errors • For the Bayes decision rule we can calculate the probability of an error• This is the probability that we assign a sample to the wrong class, also known as the risk• The risk for sample x is:R(x) = min{P1(x)P(y=1), P0(x)P(y=0)}Risk can be used to determine a ‘reject’ regionBayes errorXP(Y|X)P1(X)P(Y=1)P0(X)P(Y=0)• The probability that we assign a sample to the wrong class, is known as the risk• The risk for sample x is:• We can also compute the expected risk (the risk for the entire range of values of x):R(x) = min{P1(x)P(y=1), P0(x)P(y=0)}E[r(x)]  r(x) p(x)dxx min{ p1(x)p(y 1), p0(x)p(y  0)}dxx p(y  0) p0(x)dxL1 p(y 1) p1(x)dxL0Assuming all values equally likelyL1is the region where we assign instances to class 1Loss function• The risk value we computed assumes that both errors (assigning instances of class 1 to class 0 and vice versa) are equally harmful.• However, this is not always the case.• Why?• In general our goal is to minimize loss, often defined by a loss function: L0,1(x) which is the penalty we pay when assigning instances of class 0 to class 1dxypxpypLdxxpypLLELL01)1()()1()()0(][10,101,0Types of classifiers• We can divide the large variety of classification approaches into roughly two main types 1. Instance based classifiers- Use observation directly (no models)- e.g. K nearest neighbors2. Generative:- build a generative statistical model- e.g., Bayesian networks3. Discriminative- directly estimate a decision rule/boundary- e.g., decision treeClassification• Assume we want to teach a computer to distinguish between cats and dogs …Several steps: 1. feature transformation 2. Model / classifier specification 3. Model / classifier estimation (with regularization) 4. feature selectionClassification• Assume we want to teach a computer to distinguish between cats and dogs …Several steps: 1. feature transformation2. Model / classifier specification 3. Model / classifier estimation (with regularization) 4. feature selectionHow do we encode the picture? A collection of pixels? Do we use the entire image or a subset? …Classification• Assume we want to teach a computer to distinguish between cats and dogs …Several steps: 1. feature transformation2. Model / classifier specification3. Model / classifier estimation (with regularization) 4. feature selectionWhat type of classifier should we use?Classification• Assume we want to teach a computer to distinguish between cats and dogs …Several steps: 1. feature transformation2. Model / classifier specification 3. Model / classifier estimation (with regularization)4. feature selectionHow do we learn the parameters of our classifier? Do we have enough examples to learn a good model?Classification• Assume we want to teach a computer to distinguish between cats and dogs …Several steps: 1. feature transformation2. Model / classifier specification 3. Model / classifier estimation (with regularization) 4. feature selectionDo we really need all the features? Can we use a smaller number and still achieve the same (or better) results?Supervised learning• Classification is one of the key components of ‘supervised learning’• Unlike other learning paradigms, in supervised learning the teacher (us) provides the algorithm with the solutions to some of the instances and the goal is to generalize so that a model / method can be used to determine the labels of the unobserved samplesXClassifierw1, w2…YteacherX,YTypes of classifiers• We can divide the large variety of classification approaches into roughly two main types 1. Instance based classifiers- Use observation directly (no models)- e.g. K nearest neighbors2. Generative:- build a generative statistical model- e.g., Bayesian networks3. Discriminative- directly estimate a decision rule/boundary- e.g., decision treeK nearest neighborsK nearest neighbors (KNN)• A simple, yet surprisingly efficient algorithm• Requires the definition of a distance function or similarity measures between samples• Select the class based on the majority vote in the k closest points?K nearest neighbors (KNN)• Need to determine an appropriates value for k• What happens if we chose k=1?• What if k=3??K nearest neighbors (KNN)• Choice of k influences the ‘smoothness’ of the resulting classifier• In that sense it is similar to a kernel methods (discussed later in the course)• However, the smoothness of the function is determined by the actual distribution of the data (p(x)) and not by a predefined parameter.?The effect of increasing kThe effect of increasing kWe will be using Euclidian distance to determine what the k nearest neighbors are:d(x, x')  (xi xi')2iKNN with k=1KNN with k=3Ties are broken using the order:Red , Green, BlueKNN with k=5Ties are broken using the order:Red , Green, BlueComparisons of different k’sK = 1 K = 5K = 3A probabilistic interpretation of KNN• The decision rule of KNN can be viewed using a probabilistic interpretation• What KNN is trying to do is approximate the Bayes decision rule on a subset of the data• To do that we need to compute certain


View Full Document

CMU CS 10601 - Lecture

Documents in this Course
lecture

lecture

40 pages

Problem

Problem

12 pages

lecture

lecture

36 pages

Review

Review

32 pages

Lecture

Lecture

11 pages

Lecture

Lecture

18 pages

Notes

Notes

10 pages

Boosting

Boosting

21 pages

review

review

21 pages

review

review

28 pages

Lecture

Lecture

31 pages

lecture

lecture

52 pages

Review

Review

26 pages

review

review

29 pages

Lecture

Lecture

37 pages

Lecture

Lecture

35 pages

Boosting

Boosting

17 pages

Review

Review

35 pages

lecture

lecture

32 pages

Lecture

Lecture

28 pages

Lecture

Lecture

30 pages

lecture

lecture

29 pages

leecture

leecture

41 pages

lecture

lecture

34 pages

review

review

38 pages

review

review

31 pages

Lecture

Lecture

41 pages

Lecture

Lecture

15 pages

Lecture

Lecture

21 pages

Lecture

Lecture

38 pages

Notes

Notes

37 pages

lecture

lecture

29 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?