DOC PREVIEW
UCI ICS 273A - Evaluating Classifiers

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Evaluating ClassifiersEvaluation of ResultsEvaluationSample and True ErrorBinomial DistributionsDistribution over ErrorsBinomial MomentsConfidence IntervalsBias-VarianceFlow of ThoughtAssumptionsComparing HypothesesPaired TestsPaired t-testComparing Learning AlgorithmsEvaluation: ROC curvesConclusionEvaluating ClassifiersLecture 2Instructor: Max WellingEvaluation of Results•How do you report classification error?•How certain are you about the error you claim?•How do you compare two algorithms?•How certain are you if you state one algorithm performs better than another?Evaluation Given:•Hypothesis h(x): XC, in hypothesis space H, mapping attributes x to classes c=[1,2,3,...C]•A data-sample S(n) of size n. Questions:• What is the error of “h” on unseen data?• If we have two competing hypotheses, which one is better on unseen data?• How do we compare two learning algorithms in the face of limited data?• How certain are we about our answers?Sample and True ErrorWe can define two errors:1) Error(h|S) is the error on the sample S:2) Error(h|P) is the true error on the unseen data sampled from the distribution P(x): where f(x) is the true hypothesis.11( | ) [ ( ) ]ni iierror h S h x yn== ��( | ) ( ) [ ( ) ( )]error h P dx P x h x f x= ��Binomial Distributions• Assume you toss a coin n times. • And it has probability p of coming heads (which we will call success)• What is the probability distribution governing the number of heads in n trials?• Answer: the Binomial distribution.!(# | , ) (1 )!( )!r n rnp heads r p n p pr n r-= = --Distribution over Errors• Consider some hypothesis h(x)• Draw n samples Xk~P(X).• Do this k times.• Compute e1=n*error(h|X1), e2=n*error(h|X2),...,ek=n*error(h|Xk).• {e1,...,ek} are samples from a Binomial distribution !• Why? imagine a magic coin, where God secretly determines the probability of heads by the following procedure. First He takes some random hypothesis h. Then, He draws x~P(x) and observes if h(x) correctly predicts the label correctly. If it does, he makes sure the coin lands heads up... • You have a single sample S, for which you observe e(S) errors. What would be a reasonable estimate for Error(h|P) you think?Binomial Moments2( ) [ | , ]( ) [( [ ]) ] (1 )mean r E r n p npvar r E r E r np p= == - = -meanvar• If we match the mean, np, with the observed value n*error(h|S) we find:• If we match the variance we can obtain an estimate of the width:[ ( | )] [ / ] ( | )E error h P E r n p error h S= = �( | ) (1 ( | ))var[ ( | )] var[ / ]error h S error h Serror h P r nn� -= �Confidence Intervals• We would like to state:With N% confidence we believe that error(h|P) is contained in the interval:( | ) (1 ( | ))( | ) ( | )Nerror h S error h Serror h P error h S zn� -= �• In principle is hard to compute exactly, but for np(1-p)>5 or n>30 it is safe to approximate a Binomial by a Gaussian for which we can easily compute “z-values”.NzNormal(0,1)80%1.28�0.81.28z =Bias-Variance• The estimator is unbiased if• Imagine again you have infinitely many sample sets X1,X2,.. of size n.• Use these to compute estimates E1,E2,... of p where Ei=error(h|Xi)• If the average of E1,E2,.. converges to p, then error(h|X) is an unbiased estimator.• Two unbiased estimators can still differ in their variance (efficiency). Which one do you prefer?[ ( | )]E error h X p=p EavFlow of Thought• Determine the property you want to know about the future data (e.g. error(h|P))• Find an unbiased estimator E for this quantity based on observing data X (e.g. error(h|X))• Determine the distribution P(E) of E under the assumption you have infinitely many sample sets X1,X2,...of some size n. (e.g. p(E)=Binomial(p,n), p=error(h|P))• Estimate the parameters of P(E) from an actual data sample S (e.g. p=error(h|S))• Compute mean and variance of P(E) and pray P(E) it is close to a Normal distribution. (sums of random variables converge to normal distributions – central limit theorem)• State you confidence interval as: with confidence N% error(h|P) is contained in the interval varNY mean z= �Assumptions• We only consider discrete valued hypotheses (i.e. classes)• Training data and test data are drawn IID from the same distribution P(x). (IID: independently & identically distributed)• The hypothesis must be chosen independently from the data sample S!• When you obtain a hypothesis from a learning algorithm, split the data into a training set and a testing set. Find the hypothesis using the training set and estimate error on the testing set.Comparing Hypotheses• Assume we like to compare 2 hypothesis h1 and h2, which we have tested on two independent samples S1 and S2 of size n1 and n2.• I.e. we are interested in the quantity: ?• Define estimator for d: with X1,X2 sample sets of size n1,n2.• Since error(h1|S1) and error(h2|S2) are both approximately Normal their difference is approximately Normal with: • Hence, with N% confidence we believe that d is contained in the interval:( 1| ) ( 2| )d error h P error h P= -ˆ( 1| 1) ( 2| 2)d error h X error h X= -( 1| 1) ( 2| 2)( 1| 1)(1 ( 1| 1)) ( 2| 2)(1 ( 2| 2))var1 2mean d error h S error h Serror h S error h S error h S error h Sn n= � -- -� +varNd mean z= �Paired Tests• Consider the following data:error(h1|s1)=0.1 error(h2|s1)=0.11error(h1|s2)=0.2 error(h2|s2)=0.21error(h1|s3)=0.66 error(h2|s3)=0.67error(h1|s4)=0.45 error(h2|s4)=0.46and so on.• We have var(error(h1)) = large, var(error(h2)) = large. The total variance of error(h1)-error(h2) is their sum. However, h1 is consistently better than h2.• We ignored the fact that we compare on the same data. We want a different estimator that compares data one by one. • You can use a “paired t-test” (e.g. in matlab) to see if the two errors are significantly different, or if one error is significantly larger than the other.Paired t-test• Chunk the data up in subsets T1,...,Tk with |Ti|>30• On each subset compute the error and compute:• Now compute: • State: With N% confidence the difference in error between h1 and h2 is:• “t” is the t-statistic which is related to the student-t distribution (table 5.6). ( 1| ) ( 2| )i i ierror h T error h Td = -12111( ) ( )( 1)kiikiiksk kd dd d d==== --��, 1( )N kt sd d-�Comparing Learning Algorithms• In general it is a really bad idea to


View Full Document

UCI ICS 273A - Evaluating Classifiers

Documents in this Course
Load more
Download Evaluating Classifiers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Evaluating Classifiers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Evaluating Classifiers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?