Evaluating HypothesesIEEE Expert, October 19961Evaluating Hypotheses• Sample error, true error• Confidence intervals for observed hypothesis error• Estimators• Binomial distribution, Normal distribution, CentralLimit Theorem• Paired t tests• Comparing learning methods2Evaluating Hypotheses and LearnersConsider hypotheses H1and H2learned by learners L1and L2• How to learn H and estimate accuracy with limiteddata?• How well does observed accuracy of H over limitedsample estimate accuracy over unseen data?• If H1outperforms H2on sample, will H1outperformH2in general?• Same conclusion for L1and L2?3Two Definitions of ErrorThe true error of hypothesis h with respect to targetfunction f and distribution D is the probability that hwill misclassify an instance drawn at random accordingto D.errorD(h) ≡ Prx∈D[f(x) = h(x)]The sample error of h with respect to target functionf and data sample S is the proportion of examples hmisclassifieserrorS(h) ≡1nx∈Sδ(f(x) = h(x))Where δ(f(x) = h(x)) is 1 if f(x) = h(x), and 0otherwise.How well does errorS(h) estimate errorD(h)?4Problems Estimating Error1. Bias: If S is training set, errorS(h) is optimisticallybiasedbias ≡ E[errorS(h)] − errorD(h)For unbiased estimate, h and S must be chosenindependently2. Variance: Even with unbiased S, errorS(h)maystill vary from errorD(h)5ExampleHypothesis h misclassifies 12 of the 40 examples in SerrorS(h)=1240= .30What is errorD(h)?6EstimatorsExperiment:1. choose sample S of size n according to distribution D2. measure errorS(h)errorS(h) is a random variable (i.e., result of anexperiment)errorS(h)isanunbiasedestimator for errorD(h)Given observed errorS(h) what can we conclude abouterrorD(h)?7Confidence IntervalsIf• S contains n examples, drawn independently of hand each other• n ≥ 30Then• With approximately 95% probability, errorD(h) liesin intervalerrorS(h) ± 1.96 errorS(h)(1 − errorS(h))n8Confidence IntervalsIf• S contains n examples, drawn independently of hand each other• n ≥ 30Then• With approximately N% probability, errorD(h) liesin intervalerrorS(h) ± zN errorS(h)(1 − errorS(h))nwhereN%: 50% 68% 80% 90% 95% 98% 99%zN: 0.67 1.00 1.28 1.64 1.96 2.33 2.589errorS(h) is a Random VariableRerun the experiment with different randomly drawn S(of size n)Probability of observing r misclassified examples:00.020.040.060.080.10.120.140 5 10 15 20 25 30 35 40P(r)Binomial distribution for n = 40, p = 0.3P (r)=n!r!(n −r)!errorD(h)r(1 − errorD(h))n−r10Binomial Probability Distribution00.020.040.060.080.10.120.140 5 10 15 20 25 30 35 40P(r)Binomial distribution for n = 40, p = 0.3P (r)=n!r!(n −r)!pr(1 − p)n−rProbability P (r)ofr heads in n coin flips, ifp =Pr(heads)• Expected, or mean value of X, E[X], isE[X] ≡ni=0iP (i)=np• Variance of X isVar(X) ≡ E[(X − E[X])2]=np(1 − p)• Standard deviation of X, σX,isσX≡E[(X − E[X])2]=np(1 − p)11Normal Distribution Approximates BinomialerrorS(h) follows a Binomial distribution, with• mean µerrorS(h)= errorD(h)• standard deviation σerrorS(h)σerrorS(h)= errorD(h)(1 − errorD(h))nApproximate this by a Normal distribution with• mean µerrorS(h)= errorD(h)• standard deviation σerrorS(h)σerrorS(h)≈ errorS(h)(1 − errorS(h))n12Normal Probability Distribution00.050.10.150.20.250.30.350.4-3 -2 -1 0 1 2 3Normal distribution with mean 0, standard deviation 1p(x)=1√2πσ2e−12(x−µσ)2The probability that X will fall into the interval (a, b)isgiven bybap(x)dx• Expected, or mean value of X, E[X], isE[X]=µ• Variance of X isVar(X)=σ2• Standard deviation of X, σX,isσX= σ13Normal Probability Distribution00.050.10.150.20.250.30.350.4-3 -2 -1 0 1 2 380% of area (probability) lies in µ ± 1.28σN% of area (probability) lies in µ ± zNσN%: 50% 68% 80% 90% 95% 98% 99%zN: 0.67 1.00 1.28 1.64 1.96 2.33 2.5814Confidence Intervals, More CorrectlyIf• S contains n examples, drawn independently of hand each other• n ≥ 30Then• With approximately 95% probability, errorS(h) liesin intervalerrorD(h) ± 1.96 errorD(h)(1 − errorD(h))nequivalently, errorD(h) lies in intervalerrorS(h) ± 1.96 errorD(h)(1 − errorD(h))nwhich is approximatelyerrorS(h) ± 1.96 errorS(h)(1 − errorS(h))n15Two-Sided and One-Sided Bounds00.050.10.150.20.250.30.350.4-3 -2 -1 0 1 2 300.050.10.150.20.250.30.350.4-3 -2 -1 0 1 2 3• If µ − zNσ ≤ y ≤ µ + zNσ with confidenceN = 100(1 − α)%• Then −∞ ≤ y ≤ µ + zNσ with confidenceN = 100(1 − α/2)%andµ − zNσ ≤ y ≤ +∞ with confidenceN = 100(1 − α/2)%• Example: n = 40, r =12– Two-sided, 95% confidence (α =0.05)P (0.16 ≤ y ≤ 0.44) = 0.95– One-sidedP (y ≤ 0.44) = P (y ≥ 0.16) = (1 − α/2) = 0.97516Calculating Confidence Intervals1. Pick parameter p to estimate• errorD(h)2. Choose an estimator• errorS(h)3. Determine probability distribution that governsestimator• errorS(h) governed by Binomial distribution,approximated by Normal when n ≥ 304. Find interval (L, U) such that N% of probabilitymass falls in the interval• Use table of zNvalues17Central Limit TheoremConsider a set of independent, identically distributedrandom variables Y1...Yn, all governed by an arbitraryprobability distribution with mean µ and finite varianceσ2. Define the sample mean,¯Y ≡1nni=1YiCentral Limit Theorem. As n →∞,thedistribution governing¯Y approaches a Normaldistribution, with mean µ and varianceσ2n.18Difference Between HypothesesTest h1on sample S1, test h2on S21. Pick parameter to estimated ≡ errorD(h1) − errorD(h2)2. Choose an estimatorˆd ≡ errorS1(h1) − errorS2(h2)3. Determine probability distribution that governsestimatorσˆd≈errorS1(h1)(1 − errorS1(h1))n1+errorS2(h2)(1 − errorS2(h2))n2Find interval (L, U) such that N% of probabilitymass falls in the intervalˆd ± zNerrorS1(h1)(1 − errorS1(h1))n1+errorS2(h2)(1 − errorS2(h2))n219Hypothesis TestingP (errorD(h1) > errorD(h2)) =?• Example◦|S1| = |S2| = 100◦ errorS1(h1)=0.30◦ errorS2(h2)=0.20◦ˆd =0.10◦ σˆd=0.061• P (ˆd<µˆd+0.10) = probabilityˆd does notoverestimate d by more than 0.10◦ zN· σˆd=0.10◦ zN=1.64• P (ˆd<µˆd+1.64σˆd)=0.95• I.e., reject null hypothesis with 0.05 level ofsignificance20Paired t test to compare hA,hB1. Partition data into k disjoint test sets T1,T2,...,Tkof equal size, where this
View Full Document