Unformatted text preview:

Evaluating HypothesesIEEE Expert, October 19961Evaluating Hypotheses• Sample error, true error• Confidence intervals for observed hypothesis error• Estimators• Binomial distribution, Normal distribution, CentralLimit Theorem• Paired t tests• Comparing learning methods2Evaluating Hypotheses and LearnersConsider hypotheses H1and H2learned by learners L1and L2• How to learn H and estimate accuracy with limiteddata?• How well does observed accuracy of H over limitedsample estimate accuracy over unseen data?• If H1outperforms H2on sample, will H1outperformH2in general?• Same conclusion for L1and L2?3Two Definitions of ErrorThe true error of hypothesis h with respect to targetfunction f and distribution D is the probability that hwill misclassify an instance drawn at random accordingto D.errorD(h) ≡ Prx∈D[f(x) = h(x)]The sample error of h with respect to target functionf and data sample S is the proportion of examples hmisclassifieserrorS(h) ≡1nx∈Sδ(f(x) = h(x))Where δ(f(x) = h(x)) is 1 if f(x) = h(x), and 0otherwise.How well does errorS(h) estimate errorD(h)?4Problems Estimating Error1. Bias: If S is training set, errorS(h) is optimisticallybiasedbias ≡ E[errorS(h)] − errorD(h)For unbiased estimate, h and S must be chosenindependently2. Variance: Even with unbiased S, errorS(h)maystill vary from errorD(h)5ExampleHypothesis h misclassifies 12 of the 40 examples in SerrorS(h)=1240= .30What is errorD(h)?6EstimatorsExperiment:1. choose sample S of size n according to distribution D2. measure errorS(h)errorS(h) is a random variable (i.e., result of anexperiment)errorS(h)isanunbiasedestimator for errorD(h)Given observed errorS(h) what can we conclude abouterrorD(h)?7Confidence IntervalsIf• S contains n examples, drawn independently of hand each other• n ≥ 30Then• With approximately 95% probability, errorD(h) liesin intervalerrorS(h) ± 1.96 errorS(h)(1 − errorS(h))n8Confidence IntervalsIf• S contains n examples, drawn independently of hand each other• n ≥ 30Then• With approximately N% probability, errorD(h) liesin intervalerrorS(h) ± zN errorS(h)(1 − errorS(h))nwhereN%: 50% 68% 80% 90% 95% 98% 99%zN: 0.67 1.00 1.28 1.64 1.96 2.33 2.589errorS(h) is a Random VariableRerun the experiment with different randomly drawn S(of size n)Probability of observing r misclassified examples:00.020.040.060.080.10.120.140 5 10 15 20 25 30 35 40P(r)Binomial distribution for n = 40, p = 0.3P (r)=n!r!(n −r)!errorD(h)r(1 − errorD(h))n−r10Binomial Probability Distribution00.020.040.060.080.10.120.140 5 10 15 20 25 30 35 40P(r)Binomial distribution for n = 40, p = 0.3P (r)=n!r!(n −r)!pr(1 − p)n−rProbability P (r)ofr heads in n coin flips, ifp =Pr(heads)• Expected, or mean value of X, E[X], isE[X] ≡ni=0iP (i)=np• Variance of X isVar(X) ≡ E[(X − E[X])2]=np(1 − p)• Standard deviation of X, σX,isσX≡E[(X − E[X])2]=np(1 − p)11Normal Distribution Approximates BinomialerrorS(h) follows a Binomial distribution, with• mean µerrorS(h)= errorD(h)• standard deviation σerrorS(h)σerrorS(h)= errorD(h)(1 − errorD(h))nApproximate this by a Normal distribution with• mean µerrorS(h)= errorD(h)• standard deviation σerrorS(h)σerrorS(h)≈ errorS(h)(1 − errorS(h))n12Normal Probability Distribution00.050.10.150.20.250.30.350.4-3 -2 -1 0 1 2 3Normal distribution with mean 0, standard deviation 1p(x)=1√2πσ2e−12(x−µσ)2The probability that X will fall into the interval (a, b)isgiven bybap(x)dx• Expected, or mean value of X, E[X], isE[X]=µ• Variance of X isVar(X)=σ2• Standard deviation of X, σX,isσX= σ13Normal Probability Distribution00.050.10.150.20.250.30.350.4-3 -2 -1 0 1 2 380% of area (probability) lies in µ ± 1.28σN% of area (probability) lies in µ ± zNσN%: 50% 68% 80% 90% 95% 98% 99%zN: 0.67 1.00 1.28 1.64 1.96 2.33 2.5814Confidence Intervals, More CorrectlyIf• S contains n examples, drawn independently of hand each other• n ≥ 30Then• With approximately 95% probability, errorS(h) liesin intervalerrorD(h) ± 1.96 errorD(h)(1 − errorD(h))nequivalently, errorD(h) lies in intervalerrorS(h) ± 1.96 errorD(h)(1 − errorD(h))nwhich is approximatelyerrorS(h) ± 1.96 errorS(h)(1 − errorS(h))n15Two-Sided and One-Sided Bounds00.050.10.150.20.250.30.350.4-3 -2 -1 0 1 2 300.050.10.150.20.250.30.350.4-3 -2 -1 0 1 2 3• If µ − zNσ ≤ y ≤ µ + zNσ with confidenceN = 100(1 − α)%• Then −∞ ≤ y ≤ µ + zNσ with confidenceN = 100(1 − α/2)%andµ − zNσ ≤ y ≤ +∞ with confidenceN = 100(1 − α/2)%• Example: n = 40, r =12– Two-sided, 95% confidence (α =0.05)P (0.16 ≤ y ≤ 0.44) = 0.95– One-sidedP (y ≤ 0.44) = P (y ≥ 0.16) = (1 − α/2) = 0.97516Calculating Confidence Intervals1. Pick parameter p to estimate• errorD(h)2. Choose an estimator• errorS(h)3. Determine probability distribution that governsestimator• errorS(h) governed by Binomial distribution,approximated by Normal when n ≥ 304. Find interval (L, U) such that N% of probabilitymass falls in the interval• Use table of zNvalues17Central Limit TheoremConsider a set of independent, identically distributedrandom variables Y1...Yn, all governed by an arbitraryprobability distribution with mean µ and finite varianceσ2. Define the sample mean,¯Y ≡1nni=1YiCentral Limit Theorem. As n →∞,thedistribution governing¯Y approaches a Normaldistribution, with mean µ and varianceσ2n.18Difference Between HypothesesTest h1on sample S1, test h2on S21. Pick parameter to estimated ≡ errorD(h1) − errorD(h2)2. Choose an estimatorˆd ≡ errorS1(h1) − errorS2(h2)3. Determine probability distribution that governsestimatorσˆd≈errorS1(h1)(1 − errorS1(h1))n1+errorS2(h2)(1 − errorS2(h2))n2Find interval (L, U) such that N% of probabilitymass falls in the intervalˆd ± zNerrorS1(h1)(1 − errorS1(h1))n1+errorS2(h2)(1 − errorS2(h2))n219Hypothesis TestingP (errorD(h1) > errorD(h2)) =?• Example◦|S1| = |S2| = 100◦ errorS1(h1)=0.30◦ errorS2(h2)=0.20◦ˆd =0.10◦ σˆd=0.061• P (ˆd<µˆd+0.10) = probabilityˆd does notoverestimate d by more than 0.10◦ zN· σˆd=0.10◦ zN=1.64• P (ˆd<µˆd+1.64σˆd)=0.95• I.e., reject null hypothesis with 0.05 level ofsignificance20Paired t test to compare hA,hB1. Partition data into k disjoint test sets T1,T2,...,Tkof equal size, where this


View Full Document

WSU CSE 6363 - Evaluating Hypotheses

Download Evaluating Hypotheses
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Evaluating Hypotheses and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Evaluating Hypotheses 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?