U of M CS 5751 - Artificial Neural Networks - D1619670

Home> Schools> University of Minnesota- Twin Cities> (CS) > CS 5751> Artificial Neural Networks

U of M CS 5751 - Artificial Neural Networks

School name University of Minnesota- Twin Cities

Pages 37

Download Save

Unformatted text preview:

CS 5751 Machine LearningChapter 4 Artificial Neural Networks 1Artificial Neural Networks • Threshold units• Gradient descent• Multilayer networks• Backpropagation• Hidden layer representations• Example: Face recognition• Advanced topicsCS 5751 Machine LearningChapter 4 Artificial Neural Networks 2Connectionist ModelsConsider humans• Neuron switching time ~.001 second• Number of neurons ~1010• Connections per neuron ~104-5• Scene recognition time ~.1 second• 100 inference step does not seem like enoughmust use lots of parallel computation!Properties of artificial neural nets (ANNs):• Many neuron-like threshold switching units• Many weighted interconnections among units• Highly parallel, distributed process• Emphasis on tuning weights automaticallyCS 5751 Machine LearningChapter 4 Artificial Neural Networks 3When to Consider Neural Networks• Input is high-dimensional discrete or real-valued (e.g., raw sensor input)• Output is discrete or real valued• Output is a vector of values• Possibly noisy data• Form of target function is unknown• Human readability of result is unimportantExamples:• Speech phoneme recognition [Waibel]• Image classification [Kanade, Baluja, Rowley]• Financial predictionCS 5751 Machine LearningChapter 4 Artificial Neural Networks 4ALVINN drives 70 mph on highways4 HiddenUnits30x32 SensorInput RetinaSharpLeftSharpRightStraightAheadCS 5751 Machine LearningChapter 4 Artificial Neural Networks 5PerceptronWnW2W0W1X1ΣX2XnX0=1iniixw∑=0=∑=otherwise 1- if 1n1iiixwσ>⋅=>+++=otherwise 1-0 if 1:notationctor simpler ve use will weSometimesotherwise 1-0... if 1),...,(1101xw ) xo(xwxwwxxonnnrrrCS 5751 Machine LearningChapter 4 Artificial Neural Networks 6Decision Surface of PerceptronX2X1X2X1Represents some useful functions• What weights represent g(x1,x2) = AND(x1,x2)?But some functions not representable• e.g., not linearly separable• therefore, we will want networks of these ...CS 5751 Machine LearningChapter 4 Artificial Neural Networks 7Perceptron Training Rule•smallly sufficient is and separablelinearly is data trainingIf converge it will proveCan rate learning called .1) (e.g.,constant small is output perceptron is lue target vais )( )( where ηηη••••=•−=∆∆+←oxctxotwwwwiiiiirCS 5751 Machine LearningChapter 4 Artificial Neural Networks 8Gradient Descent[]examples trainingofset theis Where)( error squared theminimize that s'learn :Idea where, simpleconsider ,understand To221110DotwEwxw...xwwotlinear uniDdddinn∑∈−≡+++=rCS 5751 Machine LearningChapter 4 Artificial Neural Networks 9Gradient DescentCS 5751 Machine LearningChapter 4 Artificial Neural Networks 10Gradient DescentiiinwEwwEwwEwEwEwE∂∂−=∆∇−=∆∂∂∂∂∂∂≡∇ i.e.,][ :rule Training,...,,][ Gradient 10ηηrrCS 5751 Machine LearningChapter 4 Artificial Neural Networks 11Gradient Descent))(()()( )()(221 )(21 )(21,22∑∑∑∑∑−−=∂∂⋅−∂∂−=−∂∂−=−∂∂=−∂∂=∂∂ddiddidddidddddidddddidddiixotwExwtwototwototwotwwErrCS 5751 Machine LearningChapter 4 Artificial Neural Networks 12Gradient Descentiiiiiiiiiwwwxotwwwoxexamplestraining,txww). (e.g., .rning rateis the leavalue. et output s the tes and t iinput valuctor of is the vexwhere ,,txe form pair of thples is a ining exam Each traexamplestraining∆+←−+∆←∆∗><∆••><− do t w,unit weighlinear each For - )( do ,t unit weighlinear each For * output compute and instance Input the do ,_in each For - zero. toeach Initialize - do met, iscondition on terminati the Until valuerandom small some toeach Initialize 05 arg ) ,_(DESCENTGRADIENTηηηrrrrCS 5751 Machine LearningChapter 4 Artificial Neural Networks 13SummaryPerceptron training rule guaranteed to succeed if• Training examples are linearly separable• Sufficiently small learning rate ηLinear unit training rule uses gradient descent• Guaranteed to converge to hypothesis with minimum squared error• Given sufficiently small learning rate η• Even when training data contains noise• Even when training data not separable by HCS 5751 Machine LearningChapter 4 Artificial Neural Networks 14Incremental (Stochastic) Gradient DescentBatch mode Gradient Descent:Do until satisfied:][ 2.gradient theCompute .1wEww]w[EDDrrrr∇−←∇ηIncremental mode Gradient Descent:Do until satisfied:- For each training example d in D][ 2.gradient theCompute .1wEww]w[Eddrrrr∇−←∇η221)(][dDddDotwE −≡∑∈r221)(][dddotwE −≡rIncremental Gradient Descent can approximate Batch GradientDescent arbitrarily closely if ηmade small enoughCS 5751 Machine LearningChapter 4 Artificial Neural Networks 15Multilayer Networks of Sigmoid Unitsd1d0d2d3d1d0d2d3h1h2x1x2h1h2oCS 5751 Machine LearningChapter 4 Artificial Neural Networks 16Multilayer Decision SpaceCS 5751 Machine LearningChapter 4 Artificial Neural Networks 17Sigmoid UnitWnW2W0W1X1ΣX2XnX0=1iniixwnet∑==0neteneto−+==11)(σationBackpropag networksMultilayerxxdxxdex-x units sigmoid of unit sigmoid One train torulesdescent gradient derivecan We))( 1)(( )( :property Nice11 functionsigmoid theis )(σ→••−=+σσσCS 5751 Machine LearningChapter 4 Artificial Neural Networks 18The Sigmoid Function00.10.20.30.40.50.60.70.80.91-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6net inputoutput-xex+=11)(σSort of a rounded step functionUnlike step function, can take derivative (makes learningpossible)CS 5751 Machine LearningChapter 4 Artificial Neural Networks 19Error Gradient for a Sigmoid UnitidddddddiddddddidddddidDddiiwnetnetootwoototwototwotwwE∂∂∂∂−=∂∂−−=−∂∂−=−∂∂=−∂∂=∂∂∑∑∑∑∑∈)(- )( )()(221 )(21 )(2122didddDddidiididddddddxoootwExwxwwnetoonetnetneto,,)1()(:So)( )1()( :know But we−−−=∂∂=∂⋅∂=∂∂−=∂∂=∂∂∑∈rrσCS 5751 Machine LearningChapter 4 Artificial Neural Networks 20Backpropagation Algorithmjijjijijijijikoutputskkhhhkkkkkkxwwwwwwoohotook,,,,,,, ere wh ight network weeach Update4. )1( unit hidden each For 3. ))(1( unit output each For 2. outputs thecompute and example trainingInput the 1. do example, ingeach trainFor do satisfied, Untilnumbers. random small to weightsall Initializeδηδδδ=∆∆+←−←−−←•∑∈CS 5751 Machine LearningChapter 4 Artificial

View Full Document


School:
Email:
New Password:
Confirm Password:

U of M CS 5751 - Artificial Neural Networks

Sign up for free to view:

Please select your school