UCI ICS 273A - Evauation Methods 273A Spring09 - D2171675

Home> Schools> University of California, Irvine> (ICS) > ICS 273A> Evauation Methods 273A Spring09

DOC PREVIEW

UCI ICS 273A - Evauation Methods 273A Spring09

School name University of California, Irvine

Course Ics 273a- Machine Learning

Pages 17

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Evaluating ClassifiersLecture 2Instructor: Max WellingEvaluation of Results• How do you report classification error?• How certain are you about the error you claim?• How do you compare two algorithms?• How certain are you if you state one algorithm performs better than another?Evaluation Given:• Hypothesis h(x): XC, in hypothesis space H,mapping attributes x to classes c=[1,2,3,...C]• A data-sample S(n) of size n.Questions:• What is the error of “h” on unseen data?• If we have two competing hypotheses, which one is better on unseen data?• How do we compare two learning algorithms in the face of limited data?• How certain are we about our answers?Sample and True ErrorWe can define two errors:1) Error(h|S) is the error on the sample S:2) Error(h|P) is the true error on the unseen data sampled from thedistribution P(x):where f(x) is the true hypothesis.11( | ) [ ( ) ]niiierror h S h x yn( | ) ( ) [ ( ) ( )]error h P dx P x h x f xBinomial Distributions• Assume you toss a coin n times. • And it has probability p of coming heads (which we will call success)• What is the probability distribution governing the number of heads in n trials?• Answer: the Binomial distribution.!(# | , ) (1 )!( )!r n rnp heads r p n p pr n rDistribution over Errors• Consider some hypothesis h(x)• Draw n samples Xk~P(X).• Do this k times.• Compute e1=n*error(h|X1), e2=n*error(h|X2),...,ek=n*error(h|Xk).• {e1,...,ek} are samples from a Binomial distribution !• Why? imagine a magic coin, where God secretly determines the probabilityof heads by the following procedure. First He takes some random hypothesis h.Then, He draws x~P(x) and observes if h(x) correctly predicts the label correctly.If it does, he makes sure the coin lands heads up... • You have a single sample S, for which you observee(S) errors. What would be a reasonable estimate for Error(h|P) you think?Binomial Moments2( ) [ | , ]( ) [( [ ]) ] (1 )mean r E r n p npvar r E r E r np pmeanvar• If we match the mean, np, with the observed value n*error(h|S) we find:• If we match the variance we can obtain an estimate of the width:[ ( | )] [ / ] ( | )E error h P E r n p error h S( | ) (1 ( | ))var[ ( | )] var[ / ]error h S error h Serror h P r nnConfidence Intervals• We would like to state:With N% confidence we believe that error(h|P) is contained in the interval:( | ) (1 ( | ))( | ) ( | )Nerror h S error h Serror h P error h S zn• In principle is hard to compute exactly, but for np(1-p)>5 or n>30 it is safe to approximate a Binomial by a Gaussian for which we can easily compute “z-values”.NzNormal(0,1)80%1.280.81.28zBias-Variance• The estimator is unbiased if• Imagine again you have infinitely many sample sets X1,X2,.. of size n.• Use these to compute estimates E1,E2,... of p where Ei=error(h|Xi)• If the average of E1,E2,.. converges to p, then error(h|X) is an unbiased estimator.• Two unbiased estimators can still differ in theirvariance (efficiency). Which one do you prefer?[ ( | )]E error h X pp EavFlow of Thought• Determine the property you want to know about the future data (e.g. error(h|P))• Find an unbiased estimator E for this quantity based on observing data X (e.g. error(h|X))• Determine the distribution P(E) of E under the assumption you have infinitelymany sample sets X1,X2,...of some size n. (e.g. p(E)=Binomial(p,n), p=error(h|P))• Estimate the parameters of P(E) from an actual data sample S (e.g. p=error(h|S))• Compute mean and variance of P(E) and pray P(E) it is close to a Normal distribution.(sums of random variables converge to normal distributions – central limit theorem)• State you confidence interval as: with confidence N% error(h|P) is contained in the interval varNY mean zAssumptions• We only consider discrete valued hypotheses (i.e. classes)• Training data and test data are drawn IID from the same distribution P(x).(IID: independently & identically distributed)• The hypothesis must be chosen independently from the data sample S!• When you obtain a hypothesis from a learning algorithm, split the datainto a training set and a testing set. Find the hypothesis using the training setand estimate error on the testing set.Comparing Hypotheses• Assume we like to compare 2 hypothesis h1 and h2, which we havetested on two independent samples S1 and S2 of size n1 and n2.• I.e. we are interested in the quantity: ?• Define estimator for d:with X1,X2 sample sets of size n1,n2.• Since error(h1|S1) and error(h2|S2) are both approximately Normaltheir difference is approximately Normal with:• Hence, with N% confidence we believe that d is contained in the interval:( 1| ) ( 2| )d error h P error h Pˆ( 1| 1) ( 2| 2)d error h X error h X( 1| 1) ( 2| 2)( 1| 1)(1 ( 1| 1)) ( 2| 2)(1 ( 2| 2))var12mean d erro r h S error h Serror h S error h S error h S error h SnnvarNd mean zPaired Tests• Consider the following data:error(h1|s1)=0.1 error(h2|s1)=0.11error(h1|s2)=0.2 error(h2|s2)=0.21error(h1|s3)=0.66 error(h2|s3)=0.67error(h1|s4)=0.45 error(h2|s4)=0.46and so on.• We have var(error(h1)) = large, var(error(h2)) = large.The total variance of error(h1)-error(h2) is their sum.However, h1 is consistently better than h2.• We ignored the fact that we compare on the same data. We want a different estimator that compares data one by one. • You can use a “paired t-test” (e.g. in matlab) to see if the two errorsare significantly different, or if one error is significantly larger than the other.Paired t-test• Chunk the data up in subsets T1,...,Tk with |Ti|>30• On each subset compute the error and compute:• Now compute: • State: With N% confidence the difference in error between h1 and h2 is:• “t” is the t-statistic which is related to the student-t distribution (table 5.6).( 1| ) ( 2| )i i ierror h T error h T12111( ) ( )( 1)kiikiikskk,1()NktsComparing Learning Algorithms• In general it is a really bad idea to estimate error rates on the same dataon which a learning algorithm is trained. WHY?• So just as in x-validation, we split the data into k subsets:S{T1,T2,...Tk}.• Train both learning algorithm 1 (L1) and learning algorithm 2 (L2) on the complementof each subset: {S-T1,S-T2,...) to produce hypotheses {L1(S-Ti), L2(S-Ti)} for all i.• Compute for all i :• Note: we train on S-Ti, but test on Ti. • As in the last slide perform a paired t-test on these differences to compute anestimate and

View Full Document