DOC PREVIEW
Pitt CS 2750 - Machine Learning

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 2750 Machine LearningCS 2750 Machine LearningMilos [email protected] Sennott Square, x4-8845http://www.cs.pitt.edu/~milos/courses/cs2750/Lecture 3Evaluation of predictorsCS 2750 Machine LearningAdministration• Homework 1. – Due next week on Wednesday.– Report– Programs in Matlab2CS 2750 Machine LearningDesign cycleDataFeature selectionModel selectionLearningEvaluationRequire prior knowledgeCS 2750 Machine Learning• Evaluation:– Use pristine test data held out from the data set. • Reason: Overfit can cause the training error to go to zero so it makes sense to evaluate only on the test error. – Alternative: cross-validation• Three evaluation questions:– Question 1: How far is the test error from the true error?• test error approximates the generalization (true) error – Question 2. How do we compare two different predictors? Which one is better than the other?– Question 3. How do we compare two different learning algorithms? Which one is better than the other? Evaluation.3CS 2750 Machine Learning• Problem: we cannot be 100 % sure about the goodness of the test error approximation• Solution: statistical methods, confidence intervals• It is based on: – Central limit theorem: the sum of a large number of random samples is normally distributed. How far is the test error from the true error?-10 -8 -6 -4 -2 0 2 4 6 8 1000.020.040.060.080.10.120.140.160.180.21=σ2=σ4=σ0=µ),(2σµNNormal distribution:CS 2750 Machine Learning• Central limit theorem:Let random variables form a random sample from a distribution with mean and variance , then if the sample n is large, the distributionEffect of increasing the sample size n on the sample mean:Central limit theoremµ2σnXXX L,,21),(21σµnnNXnii≈∑=)/,(121nNXnniiσµ≈∑=or-2 -1.5 -1 -0 .5 0 0.5 1 1.5 200.20.40.60.811.21.41.61.820=µ42=σ30=n50=n100=n4CS 2750 Machine Learning• Sample mean:– Is normally distributed around the true mean• We can transform the sample mean as follows:• Example: Transformation to N(0,1) )1,0(NnXz ≈−=σµ)/,(121nNXnXniiσµ≈=∑=-2 0 2 4 6 8 10 1200.020.040.060.080.10.120.140.160.180.2-4 -3 -2 -1 0 1 2 3 400.050.10.150.20.250.30.350.4)4,5(NX ≈)1,0(Nz =CS 2750 Machine Learning• Assume N(0,1) • We are interested in:– Finding the symmetric interval around the mean such that the probability of seeing a sample from it is p– Measuring the distance of end points from 0 in terms ofConfidence intervals ],[ppzz−-4 -3 -2 -1 0 1 2 3 400.050.10.150.20.250.30.350.4ppz−pzp1=σ5CS 2750 Machine Learning• Assume N(0,1): • Values are tabulated• Example:• With confidence 0.95 we see values in interval Confidence intervals ],[ppzz−-4 -3 -2 -1 0 1 2 3 400.050.10.150.20.250.30.350.495.0=pp),(pzp96.1=pz1.961.9695.0=p]96.1,96.1[−CS 2750 Machine Learning• Back to case: • Probability mass under the normal curve for a symmetric interval around the mean is invariant when interval distances are measured in terms of the standard deviation• For • For Confidence intervals],[nznzXppσµσµ+−∈)/,(121nNXnXniiσµ≈=∑=)1,0(NnXz ≈−=σµ95.0=p96.1=pz)/,(121nNXnXniiσµ≈=∑=)1,0(N95.0=p)]/(96.1),/(96.1[ nnXσµσµ+−∈)]/(96.1),/(96.1[ nXnXσσµ+−∈6CS 2750 Machine Learning• Problem: But typically the variance is not known• Solution: estimate variance from the sample• Assume the sample mean falls into the interval centered at the mean:• Or equivalently that the mean falls into the interval centered around the sample mean:• This happens with some probability p that depends onConfidence interval +−∈nstnstXnpnpµµ,1)(12−−=∑=nXXsniinpt+−∈nstXnstXnpnp,µCS 2750 Machine Learning• Let:• The difference from the known variance case: – t is not normally distributed, instead it follows a Student distribution (t distribution)– Student distribution has one additional parameter: the degree of freedom– For t has n-1 degrees of freedom Confidence interval 1)(12−−=∑=nXXsniin1)-(non distributit)1( ≈−=− nsXntnµnsXtnµ−=7CS 2750 Machine LearningStudent distribution • Student distribution versus normal N(0,1)-5 -4 -3 -2 -1 0 1 2 3 4 500.050.10.150.20.250.30.350.4)1,0(N)5(tCS 2750 Machine LearningStudent distribution -5 -4 -3 -2 -1 0 1 2 3 4 500.050.10.150.20.250.30.350.45=k 20=k100=k• Student distribution with k degrees of freedom– For it approaches N(0,1)∞→k8CS 2750 Machine Learning• Select confidence level (probability) (e.g. p=0.95)• Compute interval into which the sample mean falls with that confidence:– For unknown mean and know varianceE.g. for p=0.95– For unknown mean and unknown variance– E.g. for p=0.95 and n=30So how different the test error can be?)]/(96.1),/(96.1[ nXnXσσµ+−∈],[nznzXppσµσµ+−∈])1(,)1([nsntnsntXnpnp−+−−∈µµ]045.2,045.2[nsXnsXnn+−∈µ],[nzXnzXppσσµ+−∈and])1(,)1([nsntXnsntXnpnp−+−−∈µandCS 2750 Machine LearningPredictor 1 uses function to predict ysPredictor 2 uses function to predict ys• Test data are used to approximate the true errors• Assume that: the sample size n is sufficiently large• Assume that we observed :or that• Question: How sure are we that the predictor 2 is better than the predictor 1 in terms of true errors ?Comparison of two predictors )(1xf)(2xf2111))((1∑=−=niiifynError x2122))((1∑=−=niiifynError x0021ErrorError >000021>−=∆ ErrorErrorETest errors9CS 2750 Machine Learning• True errors:• Predictor 2 is better than Predictor 1 if:–or• Problem: we do not know the true mean error difference• But we can approximate the last quantity with the sampleComparison of two predictors []21),(1))(( xxfyEErroryTrue−=[]∑=−−−=∆niiiiifyfynError12221))(())((1xx[]0))(())((2221),(>−−−= xxxfyfyEydiffµPaired squared differences for test sample[]22),(2))(( xxfyEErroryTrue−=TrueTrueErrorError21>21ErrorErrorE −=∆CS 2750 Machine LearningTrue error differences Error differences based on the sample of size n Assume: X is a random variable, such that But thenCentral limit result:Comparison of two predictors )/,(121nNXnXEniiσµ≈==∆∑=iX- is a random variable[]∑=−−−=∆niiiiifyfynE12221))(())((1xx2221))(())((iiiiifyfyX xx −−−≈[]2221),())(())((


View Full Document

Pitt CS 2750 - Machine Learning

Download Machine Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Machine Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Machine Learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?