CMU CS 10701 - Artificial Neural Networks to learn f: X  Y - D1252719

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10701> Artificial Neural Networks to learn f: X  Y

DOC PREVIEW

CMU CS 10701 - Artificial Neural Networks to learn f: X  Y

School name Carnegie Mellon University

Course Cs 10701- Introduction to Machine Learning

Pages 13

This preview shows page 1-2-3-4 out of 13 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1 Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University March 24, 2011 Today: • Non-linear regression • Artificial neural networks • Backpropagation • Cognitive modeling • Deep belief networks Reading: • Mitchell: Chapter 4 • Bishop: Chapter 5 Artificial Neural Networks to learn f: X  Y • f might be non-linear function • X (vector of) continuous and/or discrete vars • Y (vector of) continuous and/or discrete vars • Represent f by network of logistic units • Each unit is a logistic function • MLE: train weights of all units to minimize sum of squared errors of predicted network outputs • MAP: train to minimize sum of squared errors plus weight magnitudes2 ALVINN [Pomerleau 1993]34 • Consider regression problem f:XY , for scalar Y y = f(x) + ε"assume noise N(0,σε), iid deterministic M(C)LE Training for Neural Networks Learned neural network • Let’s maximize the conditional data likelihood • Consider regression problem f:XY , for scalar Y y = f(x) + ε"noise N(0,σε) deterministic MAP Training for Neural Networks Gaussian P(W) = N(0,σΙ) ln P(W) ↔ c ∑i wi25 xd = input td = target output od = observed unit output wi = weight i6 xd = input td = target output od = observed unit output wij = wt from i to j (MLE)78 Dealing with Overfitting Our learning algorithm involves a parameter n=number of gradient descent iterations How do we choose n to optimize future error? (note: similar issue for logistic regression, decision trees, …) e.g. the n that minimizes error rate of neural net over future data Dealing with Overfitting Our learning algorithm involves a parameter n=number of gradient descent iterations How do we choose n to optimize future error? • Separate available data into training and validation set • Use training to perform gradient descent • n  number of iterations that optimizes validation set error  gives unbiased estimate of optimal n (but a biased estimate of true error)9 K-Fold Cross Validation Idea: train multiple times, leaving out a disjoint subset of data each time for test. Average the test set accuracies. ________________________________________________ Partition data into K disjoint subsets For k=1 to K testData = kth subset h  classifier trained* on all data except for testData accuracy(k) = accuracy of h on testData end FinalAccuracy = mean of the K recorded testset accuracies * might withhold some of this to choose number of gradient decent steps Leave-One-Out Cross Validation This is just k-fold cross validation leaving out one example each iteration ________________________________________________ Partition data into K disjoint subsets, each containing one example For k=1 to K testData = kth subset h  classifier trained* on all data except for testData accuracy(k) = accuracy of h on testData end FinalAccuracy = mean of the K recorded testset accuracies * might withhold some of this to choose number of gradient decent steps10111213 w0 left strt right

View Full Document

CMU CS 10701 - Artificial Neural Networks to learn f: X  Y

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 13 pages.

CMU CS 10701 - Artificial Neural Networks to learn f: X  Y

Sign up for free to view:

Please select your school