DOC PREVIEW
CMU CS 15381 - Learning Conclusion: Cross- Validation

This preview shows page 1-2-23-24 out of 24 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Learning Conclusion: Cross-ValidationBayes Nets Intro:Representing and Reasoning about UncertaintyFinal Considerations: Avoiding Overfitting• We have a choice of different techniques:• Decision trees, Neural Networks, Nearest Neighbors, Bayes Classifier,…• For each we have different levels of complexity:– Depth of trees– Number of layers and hidden units– Number of neighbors in K-NN– …..• How to choose the right one?• Overfitting: A complex enough model (e.g., enough units in a neural network, large enough trees,..) will always be able to fit the training data well2Example• Construct a predictor of y from x given this training dataxyxyxyxyLinearQuadraticPiecewise LinearWhich model is best for predicting yfrom x ????3xyxyxyLinearQuadraticPiecewise LinearWhich model is best for predicting yfrom x ????We want the model that generate the best predictions on future data. Not necessarily the one with the lowest error on training dataUsing a Test Set1. Use a portion (e.g., 30%) of the data as test data2. Fit a model to the remaining training data3. Evaluate the error on the test dataxy4xyxyxyLinear QuadraticError = 2.4Error = 2.2Error = 0.9Piecewise LinearxyxyxyLinear QuadraticError = 2.4Error = 2.2Error = 0.9Piecewise LinearUsing a Test Set:+ Simple- Wastes a large % of the data- May get lucky with one particular subset of the data5“Leave One Out” Cross-Validation• For k=1 to R– Train on all the data leaving out (xk,yk)– Evaluate error on (xk,yk)• Report the average error after trying allthe data pointsxy(xk,yk)Error = 2.12Note: Numerical examples in this and subsequent slides from A. Moore6Error = 0.962Error = 3.337Leave One Out Cross-Validation• For k=1 to R– Train on all the data leaving out (xk,yk)– Evaluate error on (xk,yk)• Report the average error after trying allthe data pointsxy(xk,yk)“Leave One Out” Cross-Validation:+ Does not waste data+ Average over large number of trials- ExpensiveK-Fold Cross-Validation• Randomly divide the data set into Ksubsets • For each subset S:– Train on the data not in S– Test on the data in S• Return the average error over the K subsetsxyExample: K = 3, each color corresponds to a subset8Error = 2.05 Error = 1.11Error = 2.93Cross-Validation SummaryWastes only 1/K of the data!Only K times slower than Test Set!Wastes 1/K of the dataK times slower than Test SetK-FoldDoes not waste dataInefficientLeave One OutSimple/EfficientWastes a lot of dataPoor predictor of future performanceTest Set+-9Classification Problems• The exact same approaches apply for cross-validation except that the error is the number of data points that are misclassified.y = 1y = 0Example: Training a Neural Net• Train neural nets with different numbers of hidden units (more and more complex NNs)• For each NN, evaluate the error using K-fold Cross-Validation• Choose the one with the minimum cross-validation errorMinimum cross-validation error10Summary (R&N Chapter 20)• Learning Algorithms:– Naïve Bayes– Decision Trees– Nearest Neighbors– Neural Networks• Validation:– Error on training set should never be used directly for evaluate learning algorithm on a data set– Validation on test set– Cross-validation to avoid wasting data• Leave one out• K-fold– Used for:• Finding best configuration of learned model (complexity of neural network, K-NN, etc.)• Deciding between different learning algorithms (neural networks, nearest neighbors, decision trees,…) Bayes NetsRepresenting and Reasoning about Uncertainty11Bayes Nets• Material covered in Russell & Norvig, Chapter 14• Not covered in lectures: Networks with continuous variables• Not covered in chapter: d-separationReasoning with Uncertainty• Most real-world problems deal with uncertain information– Diagnosis: Likely disease given observed symptoms– Equipment repair: Likely component failure given sensor reading– Help desk: Likely operation based on past operations12Reasoning with Uncertainty• We saw how to use probability to represent uncertainty and to perform queries such as inference– Diagnosis: Prob (disease | observed symptoms)– Equipment repair: Prob (component | sensor readings)– Help desk: Prob (Likely operation | past operations) • We saw that representing probability distributions can be inefficient (or intractable) for large problems.Reasoning with Uncertainty• We saw how to use probability to represent uncertainty and to perform queries such as inference– Diagnosis: Prob (disease | observed symptoms)– Equipment repair: Prob (component | sensor readings)– Help desk: Prob (Likely operation | past operations) • We saw that representing probability distribution can be inefficient (or intractable) for large problems.• Today: Bayes Nets provide a powerful tool for making reasoning with uncertainty manageable by taking advantage of dependence relations between variables• For example: Knowing that the hand brake is operational does not help diagnose why the engine does not start! • We’ll start by reviewing our key probability tools.13Probability Reminder• Conditional probability for 2 events A and B:P(A|B) = P(A,B)P(B)• Chain rule:P(A,B) = P(A|B) P(B)Probability Reminder• Conditional probability for 2 variables X and Y:P(X=x | Y=y) = P(X=x,Y=y)P(Y=y)• Chain rule:P(X=x,Y=y) = P(X=x|Y=y) P(Y=y)• For any values x,y14The Joint Distribution• Joint distribution = collection of all the probabilities P(X = x,Y = y,Z = z…) for all possible combinations of values.• For m binary variables, size is 2m• Any query can be computed from the joint distribution0.08FFF0.07TFF0.15FTF0.1TTF0.08FFT0.2TFT0.22FTT0.1TTTProbZYXThe Joint Distribution• Any query can be computed from the joint distribution• Marginal distributionP(X = True), P(X = False)• Conditional distribution:P(X = True | Y = True) =P (X = True,Y = True)/P(Y = True)• In general:P(E1| E2) = P(E1,E2)/P(E2) P(E2) = Σ P(Joint Entries)Entries that match E20.08FFF0.07TFF0.15FTF0.1TTF0.08FFT0.2TFT0.22FTT0.1TTTProbZYX15The Joint Distribution• Any query can be computed from the joint distribution• Marginal distributionP(Y = True), P(Y = False)• Conditional distribution:P(X = True | Y = True) =P (X = True,Y = True)/P(Y = True)• In general:P(E1| E2) = P(E1,E2)/P(E2) P(E2) = Σ P(Joint Entries)Entries that match E20.08FFF0.07TFF0.15FTF0.1TTF0.08FFT0.2TFT0.22FTT0.1TTTProbZYXE1and E2are assignments of values to subsets of


View Full Document

CMU CS 15381 - Learning Conclusion: Cross- Validation

Documents in this Course
Planning

Planning

19 pages

Planning

Planning

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

27 pages

Lecture

Lecture

19 pages

FOL

FOL

41 pages

lecture

lecture

34 pages

Exam

Exam

7 pages

Lecture

Lecture

22 pages

Handout

Handout

11 pages

Midterm

Midterm

14 pages

lecture

lecture

83 pages

Handouts

Handouts

38 pages

mdp

mdp

37 pages

HW2

HW2

7 pages

nn

nn

25 pages

lecture

lecture

13 pages

Handout

Handout

5 pages

Lecture

Lecture

27 pages

Lecture

Lecture

62 pages

Lecture

Lecture

5 pages

Load more
Download Learning Conclusion: Cross- Validation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Learning Conclusion: Cross- Validation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Learning Conclusion: Cross- Validation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?