DOC PREVIEW
CMU CS 10601 - Notes

This preview shows page 1-2-17-18-19-36-37 out of 37 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Neural'Networks'Aarti Singh Machine Learning 10-601 Nov 3, 2011 Slides Courtesy: Tom Mitchell TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAA 1Logis0c'Regression'2 Assumes&the&following&func1onal&form&for&P(Y|X):&Logistic function (or Sigmoid): Logis1c&func1on&applied&to&a&linear&func1on&of&the&data&z logit (z) Features can be discrete or continuous!Logis0c'Regression'is'a'Linear'Classifier!'3 Assumes&the&following&func1onal&form&for&P(Y|X):&&&&&Decision&boundary:&(Linear Decision Boundary) 1 1Training'Logis0c'Regression'4 How'to'learn'the'parameters'w0,'w1,'…'wd?'Training&Data&Maximum&(Condi1onal)&Likelihood&Es1mates&&&&Discrimina1ve&philosophy&–&Don’t&waste&effort&learning&P(X),&focus&on&P(Y|X)&–&that’s&all&that&maLers&for&classifica1on!&&Op0mizing'convex'func0on'5 • Max&Condi1onal&logOlikelihood&&=&Min&Nega1ve&Condi1onal&logOlikelihood • Nega1ve&Condi1onal&logOlikelihood&is&a&convex&func1on&Gradient Descent (convex)'Gradient: Learning rate, η>0 Update rule:Logis0c'func0on'as'a'Gr aph'Sigmoid Unit d d dNeural'Networks'to'learn'f:'X'à'Y'• f&can&be&a&nonKlinear'func1on&• X&(vector&of)&con1nuous&and/or&discrete&variables&• Y&(vector&of)&con1nuous&and/or&discrete&variables&• Neural&networks&O&Represent&f&by&network&of&logis1c/sigmoid&units:&Input layer, X Output layer, Y Hidden layer, H Sigmoid UnitNeural Network trained to distinguish vowel sounds using 2 formants (features) Highly non-linear decision surface Two layers of logistic units Input layer Hidden layer Output layerNeural Network trained to drive a car! Weights of each pixel for one hidden unit Weights to output units from the hidden unitForward'Propaga0on'for'predic0on'Prediction – Given neural network (hidden units and weights), use it to predict the label of a test point Forward Propagation – Start from input layer For each subsequent layer, compute output of sigmoid unit Sigmoid unit: 1-Hidden layer, 1 output NN: ohDifferentiable d d d Training'Neural'Networks'• Consider regression problem f:XàY , for scalar Y y = f(x) + ε##### assume noise N(0,σε), iid deterministic M(C)LE'Training'for'Neural'Networks'Learned neural network • Let’s maximize the conditional data likelihood Train weights of all units to minimize sum of squared errors of predicted network outputs• Consider regression problem f:XàY , for scalar Y y = f(x) + ε##### noise N(0,σε) deterministic MAP'Training'for'Neural'Networks' Gaussian P(W) = N(0,σΙ) ln P(W) ↔ c ∑i wi2 Train weights of all units to minimize sum of squared errors of predicted network outputs plus weight magnitudesd E – Mean Square Error For Neural Networks, E[w] no longer convex in wError'Gradient'for 'a'Sigm oid'Unit'lllllllllllllllly ly ly ly ly ly llllllllllllllly Sigmoid Unit d d d(MLE) l lllly kllllll l lo Using Forward propagation yk = target output (label) ok/h = unit output (obtained by forward propagation) wij = wt from i to j Note: if i is input variable, oi = xiUsing all training data D llly l lllly lObjective/Error no longer convex in weightsOur learning algorithm involves a parameter n=number of gradient descent iterations How do we choose n to optimize future error? (note: similar issue for logistic regression, decision trees, …) e.g. the n that minimizes error rate of neural net over future data Dealing'with'OverfiVng'Our learning algorithm involves a parameter n=number of gradient descent iterations How do we choose n to optimize future error? • Separate available data into training and validation set • Use training to perform gradient descent • n ß number of iterations that optimizes validation set error Dealing'with'OverfiVng'Idea: train multiple times, leaving out a disjoint subset of data each time for test. Average the test set accuracies. ________________________________________________ Partition data into K disjoint subsets For k=1 to K testData = kth subset h ß classifier trained* on all data except for testData accuracy(k) = accuracy of h on testData end FinalAccuracy = mean of the K recorded testset accuracies * might withhold some of this to choose number of gradient decent steps KKfold'CrossKvalida0on'This is just k-fold cross validation leaving out one example each iteration ________________________________________________ Partition data into K disjoint subsets, each containing one example For k=1 to K testData = kth subset h ß classifier trained* on all data except for testData accuracy(k) = accuracy of h on testData end FinalAccuracy = mean of the K recorded testset accuracies * might withhold some of this to choose number of gradient decent steps LeaveKoneKout'CrossKvalida0on'• Cross-validation • Regularization – small weights imply NN is linear (low VC dimension) • Control number of hidden units – low complexity Dealing'with'OverfiVng'Σwixi Logistic outputw0 left strt right upSemantic Memory Model Based on ANN’s [McClelland & Rogers, Nature 2003] No hierarchy given. Train with assertions, e.g., Can(Canary,Fly)Humans act as though they have a hierarchical memory organization 1. Victims of Semantic Dementia progressively lose knowledge of objects But they lose specific details first, general properties later, suggesting hierarchical memory organization Thing Living Animal Plant NonLiving Bird Fish Canary 2. Children appear to learn general categories and properties first, following the same hierarchy, top down*. * some debate remains on this.!Question: What learning mechanism could produce this emergent hierarchy?Memory deterioration follows semantic hierarchy [McClelland & Rogers, Nature 2003]• Suppose we want to predict next state of world – and it depends on history of unknown length – e.g., robot with forward-facing sensors trying to predict next sensor reading as it moves and turns Training'Networks'on'Time'Series'• Suppose we want to predict next state of world – and it depends on history of unknown length – e.g., robot with forward-facing sensors trying to predict next sensor reading as it moves and turns • Idea: use hidden layer in network to capture state history Training'Networks'on'Time'Series'How can we train recurrent net?? Training'Networks'on'Time'Series'Ar0ficial'Neural'Networks:'Summary'• Ac1vely&used&to&model&distributed&computa1on&in&brain&•


View Full Document

CMU CS 10601 - Notes

Documents in this Course
lecture

lecture

40 pages

Problem

Problem

12 pages

lecture

lecture

36 pages

Lecture

Lecture

31 pages

Review

Review

32 pages

Lecture

Lecture

11 pages

Lecture

Lecture

18 pages

Notes

Notes

10 pages

Boosting

Boosting

21 pages

review

review

21 pages

review

review

28 pages

Lecture

Lecture

31 pages

lecture

lecture

52 pages

Review

Review

26 pages

review

review

29 pages

Lecture

Lecture

37 pages

Lecture

Lecture

35 pages

Boosting

Boosting

17 pages

Review

Review

35 pages

lecture

lecture

32 pages

Lecture

Lecture

28 pages

Lecture

Lecture

30 pages

lecture

lecture

29 pages

leecture

leecture

41 pages

lecture

lecture

34 pages

review

review

38 pages

review

review

31 pages

Lecture

Lecture

41 pages

Lecture

Lecture

15 pages

Lecture

Lecture

21 pages

Lecture

Lecture

38 pages

lecture

lecture

29 pages

Load more
Download Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?