CMU CS 10601 - Lecture - D2852093

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10601> Lecture

DOC PREVIEW

CMU CS 10601 - Lecture

School name Carnegie Mellon University

Course Cs 10601- Introduction to Machine Learning

Pages 31

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Classification10-601 Machine LearningWhere we areInputsClassifierPredictcategoryInputsDensityEstimatorProb-abilityInputsRegressorPredictreal no.√TodayLaterClassification• Assume we want to teach a computer to distinguish between cats and dogs …Bayes decision rule• If we know the conditional probability p(x | y) and class priors p(y) we can determine the appropriate class by using Bayes rule:• We can use qi(x) to select the appropriate class.• We chose class 0 if q0(x)  q1(x) and class 1 otherwise• This is termed the ‘Bayes decision rule’ and leads to optimal classification.• However, it is often very hard to compute …P(y  i | x) P(x | y  i)P(y  i )P(x)defqi(x)Note that p(x) does not affect our decisionMinimizes our probability of making a mistakeBayes decision rule• We can also use the resulting probabilities to determine our confidence in the class assignment by looking at the likelihood ratio:P(y  i | x) P(x | y  i)P(y  i )P(x)defqi(x)L(x) q0(x)q1(x)Also known as likelihood ratio test, we will talk more about this laterBayes decision rule: ExamplexxNormal Gaussiansx1x21122xxNormal Gaussiansx1x21122Bayes errorXP(Y|X)P1(X)P(Y=1)P0(X)P(Y=0)p(x | y =1)x values for which we will have errors • For the Bayes decision rule we can calculate the probability of an error• This is the probability that we assign a sample to the wrong class, also known as the risk• The risk for sample x is:R(x) = min{P1(x)P(y=1), P0(x)P(y=0)}Risk can be used to determine a ‘reject’ regionBayes errorXP(Y|X)P1(X)P(Y=1)P0(X)P(Y=0)• The probability that we assign a sample to the wrong class, is known as the risk• The risk for sample x is:• We can also compute the expected risk (the risk for the entire range of values of x):R(x) = min{P1(x)P(y=1), P0(x)P(y=0)}E[r(x)]  r(x) p(x)dxx min{ p1(x)p(y 1), p0(x)p(y  0)}dxx p(y  0) p0(x)dxL1 p(y 1) p1(x)dxL0Assuming all values equally likelyL1is the region where we assign instances to class 1Loss function• The risk value we computed assumes that both errors (assigning instances of class 1 to class 0 and vice versa) are equally harmful.• However, this is not always the case.• Why?• In general our goal is to minimize loss, often defined by a loss function: L0,1(x) which is the penalty we pay when assigning instances of class 0 to class 1dxypxpypLdxxpypLLELL01)1()()1()()0(][10,101,0Types of classifiers• We can divide the large variety of classification approaches into roughly two main types 1. Instance based classifiers- Use observation directly (no models)- e.g. K nearest neighbors2. Generative:- build a generative statistical model- e.g., Bayesian networks3. Discriminative- directly estimate a decision rule/boundary- e.g., decision treeClassification• Assume we want to teach a computer to distinguish between cats and dogs …Several steps: 1. feature transformation 2. Model / classifier specification 3. Model / classifier estimation (with regularization) 4. feature selectionClassification• Assume we want to teach a computer to distinguish between cats and dogs …Several steps: 1. feature transformation2. Model / classifier specification 3. Model / classifier estimation (with regularization) 4. feature selectionHow do we encode the picture? A collection of pixels? Do we use the entire image or a subset? …Classification• Assume we want to teach a computer to distinguish between cats and dogs …Several steps: 1. feature transformation2. Model / classifier specification3. Model / classifier estimation (with regularization) 4. feature selectionWhat type of classifier should we use?Classification• Assume we want to teach a computer to distinguish between cats and dogs …Several steps: 1. feature transformation2. Model / classifier specification 3. Model / classifier estimation (with regularization)4. feature selectionHow do we learn the parameters of our classifier? Do we have enough examples to learn a good model?Classification• Assume we want to teach a computer to distinguish between cats and dogs …Several steps: 1. feature transformation2. Model / classifier specification 3. Model / classifier estimation (with regularization) 4. feature selectionDo we really need all the features? Can we use a smaller number and still achieve the same (or better) results?Supervised learning• Classification is one of the key components of ‘supervised learning’• Unlike other learning paradigms, in supervised learning the teacher (us) provides the algorithm with the solutions to some of the instances and the goal is to generalize so that a model / method can be used to determine the labels of the unobserved samplesXClassifierw1, w2…YteacherX,YTypes of classifiers• We can divide the large variety of classification approaches into roughly two main types 1. Instance based classifiers- Use observation directly (no models)- e.g. K nearest neighbors2. Generative:- build a generative statistical model- e.g., Bayesian networks3. Discriminative- directly estimate a decision rule/boundary- e.g., decision treeK nearest neighborsK nearest neighbors (KNN)• A simple, yet surprisingly efficient algorithm• Requires the definition of a distance function or similarity measures between samples• Select the class based on the majority vote in the k closest points?K nearest neighbors (KNN)• Need to determine an appropriates value for k• What happens if we chose k=1?• What if k=3??K nearest neighbors (KNN)• Choice of k influences the ‘smoothness’ of the resulting classifier• In that sense it is similar to a kernel methods (discussed later in the course)• However, the smoothness of the function is determined by the actual distribution of the data (p(x)) and not by a predefined parameter.?The effect of increasing kThe effect of increasing kWe will be using Euclidian distance to determine what the k nearest neighbors are:d(x, x')  (xi xi')2iKNN with k=1KNN with k=3Ties are broken using the order:Red , Green, BlueKNN with k=5Ties are broken using the order:Red , Green, BlueComparisons of different k’sK = 1 K = 5K = 3A probabilistic interpretation of KNN• The decision rule of KNN can be viewed using a probabilistic interpretation• What KNN is trying to do is approximate the Bayes decision rule on a subset of the data• To do that we need to compute certain

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-14-15-30-31 out of 31 pages.

CMU CS 10601 - Lecture

Sign up for free to view:

Please select your school