Machine Learning 1010 701 15701 15 781 Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2 January 16 2006 Reading Chap 2 5 CB and handouts Miscellaneous issues z Office hour z Class website z Auditing z Class project 1 Outline z What is theoretically the best classifier z Bayesian decision rule for Minimum Error z Nonparametric Classifier Instance based learning z Nonparametric density estimation z K nearest neighbor classifier z Optimality of kNN Classification z Representing data z Hypothesis classifier 2 Decision making as dividing a high dimensional space z Distributions of samples from normal and abnormal machine Basic Probability Concepts z A sample space S is the set of all possible outcomes of a conceptual or physical repeatable experiment S can be finite or infinite z E g S may be the set of all possible outcomes of a dice roll S 1 2 3 4 5 6 z E g S may be the set of all possible nucleotides of a DNA site S A T C G z E g S may be the set of all possible positions time space positions of a aircraft on a radar screen S 0 Rmax 0 360 o 0 3 Random Variable z A random variable is a function that associates a unique numerical value a token with every outcome of an experiment The value of the r v will vary from trial to trial as the experiment is repeated X S z Discrete r v z z The outcome of a dice roll z The outcome of reading a nt at site i Xi x1 x2 xn T Binary event and indicator variable z Seeing an A at a site X 1 o w X 0 z This describes the true or false outcome a random event z Can we describe richer outcomes in the same way i e X 1 2 3 4 for being A C G T think about what would happen if we take expectation of X z Unit Base Random vector z Continuous r v Xi XiA XiT XiG XiC Xi 0 0 1 0 seeing a G at site i z z The outcome of recording the true location of an aircraft Xtrue The outcome of observing the measured location of an aircraft Xobs Discrete Prob Distribution z In the discrete case a probability distribution P on S and hence on the domain of X is an assignment of a non negative real number P s to each s S or each valid value of x such that s SP s 1 0 P s 1 z z z intuitively P s corresponds to the frequency or the likelihood of getting s in the experiments if repeated many times call s P s the parameters in a discrete probability distribution A probability distribution on a sample space is sometimes called a probability model in particular if several different distributions are under consideration z z z write models as M1 M2 probabilities as P X M1 P X M2 e g M1 may be the appropriate prob dist if X is from fair dice M2 is for the loaded dice M is usually a two tuple of dist family dist parameters 4 Discrete Distributions z Bernoulli distribution Ber p 1 p for x 0 for x 1 p P x z P x p x 1 p 1 x Multinomial distribution Mult 1 z Multinomial indicator variable X 1 2 X X 3 X 4 X X 5 6 X X j 0 1 and X j 1 j 1 6 where X j 1 w p j j 1 j 1 6 p x j P X j 1 where j index the dice face j A C xA xC G xG T xT k xk x k Discrete Distributions z Multinomial distribution Mult n z Count variable x1 X M x K p x where x j n j 1 2 K n n 1 x 2 x L K x 1 2 x x 1 x 2 L x K x x L x K 5 Continuous Prob Distribution z A continuous random variable X can assume any value in an interval on the real line or in a region in a high dimensional space z A random vector X x1 x2 xn T usually corresponds to a real valued measurements of some property e g length position z It is not possible to talk about the probability of the random variable assuming a particular value P x 0 z Instead we talk about the probability of the random variable assuming a value within a given interval or half interval z P X x1 x 2 P X x P X x z Arbitrary Boolean combination of basic propositions Continuous Prob Distribution z The probability of the random variable assuming a value within some given interval from x1 to x2 is defined to be the area under the graph of the probability density function between x1 and x2 z x Probability mass P X x1 x 2 2 p x dx x1 note that z p x dx 1 Cumulative distribution function CDF x P x P X x p x dx z Probability density function PDF p x d P x dx p x dx 1 Car flow on Liberty Bridge cooked up p x 0 x 6 Continuous Distributions z Uniform Probability Density Function p x 1 b a 0 z elsewhere Normal Gaussian Probability Density Function p x z for a x b f x 2 2 1 e x 2 2 z The distribution is symmetric and is often illustrated as a bell shaped curve z Two parameters mean and standard deviation determine the location and shape of the distribution z The highest point on the normal curve is at the mean which is also the median and mode z The mean can be any numerical value negative zero or positive x Multivariate Gaussian r p X 1 2 n 2 1 2 rT r 1 exp X 1 X 2 Class Conditional Probability z Classification specific Dist P X Y p X Y 1 r p1 X 1 1 p X Y 2 r p2 X 2 2 z Class prior i e weight P Y 7 The Bayes Rule z What we have just did leads to the following general expression P Y X P X Y p Y P X This is Bayes Rule The Bayes Decision Rule for Minimum Error z The a posteriori probability of a sample P Y i X i pi X p X Y i P Y i qi X p X i i pi X z Bayes Test z Likelihood Ratio l X z Discriminant function h X 8 Example of Decision Rules z When each class is a normal z We can write the decision boundary analytically in some cases homework Bayes Error z We must calculate the probability of error z the probability that a sample is assigned to the wrong class z Given a datum X what is the risk z The Bayes error the expected risk 9 More on Bayes …
View Full Document