CMU CS 10701 - lecture - D3001299

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10701> lecture

DOC PREVIEW

CMU CS 10701 - lecture

School name Carnegie Mellon University

Course Cs 10701- Introduction to Machine Learning

Pages 20

This preview shows page 1-2-19-20 out of 20 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Machine Learning 1010 701 15701 15 781 Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2 January 16 2006 Reading Chap 2 5 CB and handouts Outline z What is theoretically the best classifier z Bayesian decision rule for Minimum Error z Nonparametric Classifier Instance based learning z Nonparametric density estimation z K nearest neighbor classifier z Optimality of kNN 1 Classification z Representing data z Hypothesis classifier Decision making as dividing a high dimensional space z Distributions of samples from normal and abnormal machine 2 Basic Probability Concepts z A sample space S is the set of all possible outcomes of a conceptual or physical repeatable experiment S can be finite or infinite z E g S may be the set of all possible outcomes of a dice roll S 1 2 3 4 5 6 z E g S may be the set of all possible nucleotides of a DNA site S A T C G z E g S may be the set of all possible positions time space positions of a aircraft on a radar screen S 0 Rmax 0 360 o 0 Random Variable z A random variable is a function that associates a unique numerical value a token with every outcome of an experiment The value of the r v will vary from trial to trial as the experiment is repeated X S z Discrete r v z z z The outcome of a dice roll z The outcome of reading a nt at site i Xi x1 x2 xn T Binary event and indicator variable z Seeing an A at a site X 1 o w X 0 z This describes the true or false outcome a random event z Can we describe richer outcomes in the same way i e X 1 2 3 4 for being A C G T think about what would happen if we take expectation of X Unit Base Random vector Xi XiA XiT XiG XiC Xi 0 0 1 0 seeing a G at site i z Continuous r v z z The outcome of recording the true location of an aircraft Xtrue The outcome of observing the measured location of an aircraft Xobs 3 Discrete Prob Distribution z In the discrete case a probability distribution P on S and hence on the domain of X is an assignment of a non negative real number P s to each s S or each valid value of x such that s SP s 1 0 P s 1 z z z intuitively P s corresponds to the frequency or the likelihood of getting s in the experiments if repeated many times call s P s the parameters in a discrete probability distribution A probability distribution on a sample space is sometimes called a probability model in particular if several different distributions are under consideration z z z write models as M1 M2 probabilities as P X M1 P X M2 e g M1 may be the appropriate prob dist if X is from fair dice M2 is for the loaded dice M is usually a two tuple of dist family dist parameters Discrete Distributions z Bernoulli distribution Ber p 1 p for x 0 for x 1 p P x z P x p x 1 p 1 x Multinomial distribution Mult 1 z Multinomial indicator variable X1 X 2 X X 3 X 4 X 5 X 6 X j 0 1 and where X j 1 w p j X j 1 j 1 6 j j 1 6 1 p x j P X j 1 where j index the dice face j A xA C xC G xG T xT k xk x k 4 Discrete Distributions z Multinomial distribution Mult n z Count variable x1 X M xK p x where x j n j n n 1x 2x L K xK x x1 x 2 LxK x1 x 2 LxK 1 2 Continuous Prob Distribution z A continuous random variable X can assume any value in an interval on the real line or in a region in a high dimensional space z A random vector X x1 x2 xn T usually corresponds to a real valued measurements of some property e g length position z It is not possible to talk about the probability of the random variable assuming a particular value P x 0 z Instead we talk about the probability of the random variable assuming a value within a given interval or half interval z P X x1 x 2 z Arbitrary Boolean combination of basic propositions P X x P X x 5 Continuous Prob Distribution z The probability of the random variable assuming a value within some given interval from x1 to x2 is defined to be the area under the graph of the probability density function between x1 and x2 z x Probability mass P X x1 x 2 2 p x dx x1 note that z p x dx 1 Cumulative distribution function CDF x P x P X x p x dx z Probability density function PDF p x d P x dx p x dx 1 Car flow on Liberty Bridge cooked up p x 0 x Continuous Distributions z Uniform Probability Density Function p x 1 b a 0 z Normal Gaussian Probability Density Function p x z for a x b elsewhere f x 2 2 1 e x 2 2 z The distribution is symmetric and is often illustrated as a bell shaped curve z Two parameters mean and standard deviation determine the location and shape of the distribution z The highest point on the normal curve is at the mean which is also the median and mode z The mean can be any numerical value negative zero or positive x Multivariate Gaussian r p X 1 2 n 2 1 2 rT r 1 exp X 1 X 2 6 Class Conditional Probability z Classification specific Dist P X Y p X Y 1 r p1 X 1 1 p X Y 2 r p2 X 2 2 z Class prior i e weight P Y The Bayes Rule z What we have just did leads to the following general expression P Y X P X Y p Y P X This is Bayes Rule 7 The Bayes Decision Rule for Minimum Error z The a posteriori probability of a sample P Y i X i pi X p X Y i P Y i qi X p X i i pi X z Bayes Test z Likelihood Ratio l X z Discriminant function h X Example of Decision Rules z When each class is a normal z We can write the decision boundary analytically in some cases homework 8 Bayes Error z We must calculate the probability of error z the probability that a sample is assigned to the wrong class z Given a datum X what is the risk z The Bayes error the expected risk More on Bayes Error z Bayes error is the lower bound of probability of classification error z Bayes classifier is the theoretically best classifier that minimize probability of …

View Full Document

CMU CS 10701 - lecture

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-19-20 out of 20 pages.

CMU CS 10701 - lecture

Sign up for free to view:

Please select your school