Machine Learning 1010 701 15701 15 781 Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2 January 16 2006 Reading Chap 2 5 CB and handouts Outline z What is theoretically the best classifier z Bayesian decision rule for Minimum Error z Nonparametric Classifier Instance based learning z Nonparametric density estimation z K nearest neighbor classifier z Optimality of kNN 1 Classification z Representing data z Hypothesis classifier Decision making as dividing a high dimensional space z Distributions of samples from normal and abnormal machine 2 Basic Probability Concepts z A sample space S is the set of all possible outcomes of a conceptual or physical repeatable experiment S can be finite or infinite z E g S may be the set of all possible outcomes of a dice roll S 1 2 3 4 5 6 z E g S may be the set of all possible nucleotides of a DNA site S A T C G z E g S may be the set of all possible positions time space positions of a aircraft on a radar screen S 0 Rmax 0 360 o 0 Random Variable z A random variable is a function that associates a unique numerical value a token with every outcome of an experiment The value of the r v will vary from trial to trial as the experiment is repeated X S z Discrete r v z z z The outcome of a dice roll z The outcome of reading a nt at site i Xi x1 x2 xn T Binary event and indicator variable z Seeing an A at a site X 1 o w X 0 z This describes the true or false outcome a random event z Can we describe richer outcomes in the same way i e X 1 2 3 4 for being A C G T think about what would happen if we take expectation of X Unit Base Random vector Xi XiA XiT XiG XiC Xi 0 0 1 0 seeing a G at site i z Continuous r v z z The outcome of recording the true location of an aircraft Xtrue The outcome of observing the measured location of an aircraft Xobs 3 Discrete Prob Distribution z In the discrete case a probability distribution P on S and hence on the domain of X is an assignment of a non negative real number P s to each s S or each valid value of x such that s SP s 1 0 P s 1 z z z intuitively P s corresponds to the frequency or the likelihood of getting s in the experiments if repeated many times call s P s the parameters in a discrete probability distribution A probability distribution on a sample space is sometimes called a probability model in particular if several different distributions are under consideration z z z write models as M1 M2 probabilities as P X M1 P X M2 e g M1 may be the appropriate prob dist if X is from fair dice M2 is for the loaded dice M is usually a two tuple of dist family dist parameters Discrete Distributions z Bernoulli distribution Ber p 1 p for x 0 for x 1 p P x z P x p x 1 p 1 x Multinomial distribution Mult 1 z Multinomial indicator variable X1 X 2 X X 3 X 4 X 5 X 6 X j 0 1 and where X j 1 w p j X j 1 j 1 6 j j 1 6 1 p x j P X j 1 where j index the dice face j A xA C xC G xG T xT k xk x k 4 Discrete Distributions z Multinomial distribution Mult n z Count variable x1 X M xK p x where x j n j n n 1x 2x L K xK x x1 x 2 LxK x1 x 2 LxK 1 2 Continuous Prob Distribution z A continuous random variable X can assume any value in an interval on the real line or in a region in a high dimensional space z A random vector X x1 x2 xn T usually corresponds to a real valued measurements of some property e g length position z It is not possible to talk about the probability of the random variable assuming a particular value P x 0 z Instead we talk about the probability of the random variable assuming a value within a given interval or half interval z P X x1 x 2 z Arbitrary Boolean combination of basic propositions P X x P X x 5 Continuous Prob Distribution z The probability of the random variable assuming a value within some given interval from x1 to x2 is defined to be the area under the graph of the probability density function between x1 and x2 z x Probability mass P X x1 x 2 2 p x dx x1 note that z p x dx 1 Cumulative distribution function CDF x P x P X x p x dx z Probability density function PDF p x d P x dx p x dx 1 Car flow on Liberty Bridge cooked up p x 0 x Continuous Distributions z Uniform Probability Density Function p x 1 b a 0 z Normal Gaussian Probability Density Function p x z for a x b elsewhere f x 2 2 1 e x 2 2 z The distribution is symmetric and is often illustrated as a bell shaped curve z Two parameters mean and standard deviation determine the location and shape of the distribution z The highest point on the normal curve is at the mean which is also the median and mode z The mean can be any numerical value negative zero or positive x Multivariate Gaussian r p X 1 2 n 2 1 2 rT r 1 exp X 1 X 2 6 Class Conditional Probability z Classification specific Dist P X Y p X Y 1 r p1 X 1 1 p X Y 2 r p2 X 2 2 z Class prior i e weight P Y The Bayes Rule z What we have just did leads to the following general expression P Y X P X Y p Y P X This is Bayes Rule 7 The Bayes Decision Rule for Minimum Error z The a posteriori probability of a sample P Y i X i pi X p X Y i P Y i qi X p X i i pi X z Bayes Test z Likelihood Ratio l X z Discriminant function h X Example of Decision Rules z When each class is a normal z We can write the decision boundary analytically in some cases homework 8 Bayes Error z We must calculate the probability of error z the probability that a sample is assigned to the wrong class z Given a datum X what is the risk z The Bayes error the expected risk More on Bayes Error z Bayes error is the lower bound of probability of classification error z Bayes classifier is the theoretically best classifier that minimize probability of …
View Full Document