CMU CS 10701 - Neural Networks - D2549387

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10701> Neural Networks

DOC PREVIEW

CMU CS 10701 - Neural Networks

School name Carnegie Mellon University

Course Cs 10701- Introduction to Machine Learning

Pages 11

This preview shows page 1-2-3-4 out of 11 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Neural Networks 10701 15781 Recitation February 12 2008 Parts of the slides are from previous years 10701 recitation and lecture notes and from Prof Andrew Moore s data mining tutorials Recall Linear Regression Prediction of continuous variables Learn the mapping f X Y f x wi xi or i w x i i i Model is linear in the parameters w some noise Assume Gaussian noise T 1 T X X X Y Learn MLE w Neural Network Neural nets are also models with w parameters in them They are now called weights As before we want to compute the weights to minimize sum of squared residuals Which turns out under Gaussian i i d noise assumption to be max likelihood Instead of explicitly solving for max likelihood weights we use Gradient Descent Perceptrons Input x x1 xn and target value t n Output o x f w0 wi xi i 1 1 if net 0 e g o x sign net 1 otherwise 1 o x net n or 1 exp w0 wi xi i 1 n 1 where net w0 wi xi net net 1 e i 1 sigmoid Given training data x l t l find w which minimizes 1 L l E t o x l 2 2 l 1 Gradient descent General framework for finding a minimum of a continuous differentiable function f w Start with some initial value w 1 and compute the gradient vector f w 1 The next value w 2 is obtained by moving some distance from w 1 in the direction of steepest descent i e along the negative of the gradient w k 1 w k k f w k Gradient Descent on a Perceptron The sigmoid perceptron update rule L w j w j l l 1 l x jl l 1 n where l w j x jl l tl l j 0 Boolean Functions e g using step activation function with threshold 0 can we learn the function X1 AND X X OR X X AND NOT X X XOR X Multilayer Networks The class of functions representable by perceptron is limited Think of nonlinear functions o x h W j f w ji xi j i A 1 Hidden layer Net Ninput 2 Nhidden 3 Noutput 1 Backpropagation HW2 Problem 2 Output in k th output unit from input x ok x f Wkj f w ji xi j i With bias add a constant term for every noninput unit 1 K Learn w to minimize E tk ok x 2 2 k 1 Backpropagation Initialize all weights Do until convergence 1 Input a training example to the network and compute the output ok 2 Update each hidden to output weight wkj by wkj wkj k y j where k t k ok f net k y j output from hidden unit j 3 Update each input to hidden weight wji by w ji w ji j yi where j wkj k f net j k

View Full Document

CMU CS 10701 - Neural Networks

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 11 pages.

CMU CS 10701 - Neural Networks

Sign up for free to view:

Please select your school