CMU CS 10701 - Artificial Neural Networks to learn f: X  Y - D1252719

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10701> Artificial Neural Networks to learn f: X  Y

DOC PREVIEW

CMU CS 10701 - Artificial Neural Networks to learn f: X  Y

School name Carnegie Mellon University

Course Cs 10701- Introduction to Machine Learning

Pages 13

This preview shows page 1-2-3-4 out of 13 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Machine Learning 10 701 Tom M Mitchell Machine Learning Department Carnegie Mellon University March 24 2011 Today Reading Non linear regression Artificial neural networks Backpropagation Cognitive modeling Deep belief networks Mitchell Chapter 4 Bishop Chapter 5 Artificial Neural Networks to learn f X Y f might be non linear function X vector of continuous and or discrete vars Y vector of continuous and or discrete vars Represent f by network of logistic units Each unit is a logistic function MLE train weights of all units to minimize sum of squared errors of predicted network outputs MAP train to minimize sum of squared errors plus weight magnitudes 1 ALVINN Pomerleau 1993 2 3 M C LE Training for Neural Networks Consider regression problem f X Y for scalar Y y f x assume noise N 0 iid deterministic Let s maximize the conditional data likelihood Learned neural network MAP Training for Neural Networks Consider regression problem f X Y for scalar Y y f x noise N 0 deterministic Gaussian P W N 0 ln P W c i wi2 4 xd input td target output od observed unit output wi weight i 5 MLE xd input td target output od observed unit output wij wt from i to j 6 7 Dealing with Overfitting Our learning algorithm involves a parameter n number of gradient descent iterations How do we choose n to optimize future error note similar issue for logistic regression decision trees e g the n that minimizes error rate of neural net over future data Dealing with Overfitting Our learning algorithm involves a parameter n number of gradient descent iterations How do we choose n to optimize future error Separate available data into training and validation set Use training to perform gradient descent n number of iterations that optimizes validation set error gives unbiased estimate of optimal n but a biased estimate of true error 8 K Fold Cross Validation Idea train multiple times leaving out a disjoint subset of data each time for test Average the test set accuracies Partition data into K disjoint subsets For k 1 to K testData kth subset h classifier trained on all data except for testData accuracy k accuracy of h on testData end FinalAccuracy mean of the K recorded testset accuracies might withhold some of this to choose number of gradient decent steps Leave One Out Cross Validation This is just k fold cross validation leaving out one example each iteration Partition data into K disjoint subsets each containing one example For k 1 to K testData kth subset h classifier trained on all data except for testData accuracy k accuracy of h on testData end FinalAccuracy mean of the K recorded testset accuracies might withhold some of this to choose number of gradient decent steps 9 10 11 12 w0 left strt right up 13

View Full Document

CMU CS 10701 - Artificial Neural Networks to learn f: X  Y

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 13 pages.

CMU CS 10701 - Artificial Neural Networks to learn f: X  Y

Sign up for free to view:

Please select your school