1 Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University March 24, 2011 Today: • Non-linear regression • Artificial neural networks • Backpropagation • Cognitive modeling • Deep belief networks Reading: • Mitchell: Chapter 4 • Bishop: Chapter 5 Artificial Neural Networks to learn f: X Y • f might be non-linear function • X (vector of) continuous and/or discrete vars • Y (vector of) continuous and/or discrete vars • Represent f by network of logistic units • Each unit is a logistic function • MLE: train weights of all units to minimize sum of squared errors of predicted network outputs • MAP: train to minimize sum of squared errors plus weight magnitudes2 ALVINN [Pomerleau 1993]34 • Consider regression problem f:XY , for scalar Y y = f(x) + ε"assume noise N(0,σε), iid deterministic M(C)LE Training for Neural Networks Learned neural network • Let’s maximize the conditional data likelihood • Consider regression problem f:XY , for scalar Y y = f(x) + ε"noise N(0,σε) deterministic MAP Training for Neural Networks Gaussian P(W) = N(0,σΙ) ln P(W) ↔ c ∑i wi25 xd = input td = target output od = observed unit output wi = weight i6 xd = input td = target output od = observed unit output wij = wt from i to j (MLE)78 Dealing with Overfitting Our learning algorithm involves a parameter n=number of gradient descent iterations How do we choose n to optimize future error? (note: similar issue for logistic regression, decision trees, …) e.g. the n that minimizes error rate of neural net over future data Dealing with Overfitting Our learning algorithm involves a parameter n=number of gradient descent iterations How do we choose n to optimize future error? • Separate available data into training and validation set • Use training to perform gradient descent • n number of iterations that optimizes validation set error gives unbiased estimate of optimal n (but a biased estimate of true error)9 K-Fold Cross Validation Idea: train multiple times, leaving out a disjoint subset of data each time for test. Average the test set accuracies. ________________________________________________ Partition data into K disjoint subsets For k=1 to K testData = kth subset h classifier trained* on all data except for testData accuracy(k) = accuracy of h on testData end FinalAccuracy = mean of the K recorded testset accuracies * might withhold some of this to choose number of gradient decent steps Leave-One-Out Cross Validation This is just k-fold cross validation leaving out one example each iteration ________________________________________________ Partition data into K disjoint subsets, each containing one example For k=1 to K testData = kth subset h classifier trained* on all data except for testData accuracy(k) = accuracy of h on testData end FinalAccuracy = mean of the K recorded testset accuracies * might withhold some of this to choose number of gradient decent steps10111213 w0 left strt right
View Full Document