Pitt CS 2750 - Multi-layer neural networks - D2379353

Home> Schools> University of Pittsburgh> Computer Science (CS) > CS 2750> Multi-layer neural networks

DOC PREVIEW

Pitt CS 2750 - Multi-layer neural networks

School name University of Pittsburgh

Course Cs 2750- Machine Learning

Pages 11

This preview shows page 1-2-3-4 out of 11 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CS 2750 Machine LearningCS 2750 Machine LearningLecture 10Milos [email protected] Sennott SquareMulti-layer neural networksCS 2750 Machine LearningLinear unitsLogistic regressionLinear regressiondx∑11x0w1w2wdw2x)(xfxwxTf =)()(),|1()( xwwxxTgypf ===xxww ))(( fy −+←αGradient update:Gradient update:The samexxww ))(( fy −+←α∑1)|1( xyp =0w1w2wdwz=)(xf1xdx2x∑=−+←niiiify1))(( xxwwα∑=−+←niiiify1))(( xxwwαOnline:Online:2CS 2750 Machine LearningLimitations of basic linear unitsLogistic regressionLinear regression∑1)|1( xyp =0w1w2wdwz∑10w1w2wdw)(xfFunction linear in inputs !! Linear decision boundary!!∑=+=djjjxwwf10)(x)(),|1()(10∑=+===djjjxwwgypf wxx1xdx2x1xdx2xCS 2750 Machine LearningExtensions of simple linear units)()(10xxjmjjwwfφ∑=+=∑)(1xφ)(2xφ)( xmφ11x0w1w2wmwdx)(xjφ- an arbitrary function of x•use feature (basis) functions to model nonlinearities))(()(10xxjmjjwwgfφ∑=+=Linear regression Logistic regression3CS 2750 Machine LearningRegression with a quadratic model.CS 2750 Machine LearningQuadratic decision boundary-4 -3 -2 -1 0 1 2 3 4 5 6-4-3-2-10123454CS 2750 Machine LearningMulti-layered neural networks• Offer an alternative way to introduce nonlinearities to regression/classification models• Idea: Cascade several simple logistic regression units.• Motivation: from a neuron and synaptic connections.CS 2750 Machine LearningModel of a neuron∑0w1w2wkwzx1yThreshold function5CS 2750 Machine LearningMultilayer neural networkHidden layer Output layerInput layerCascades multiple logistic regression unitsAlso called a multilayer perceptron (MLP)∑11x)|1( xyp =)1(1,0w)1(1,kw)1(2,kwdx2x)2(1z)1(2,0w∑)1(1z)1(2z1)2(1,0w)2(1,1w)2(1,2wExample: a (2 layer) classifier with non-linear decision boundariesCS 2750 Machine LearningMultilayer neural network• Models non-linearities through logistic regression units• Can be applied to both regression and binary classificationproblems ∑1),|1()( wxx ==ypf)1(1,0w)1(1,kw)1(2,kw)2(1z)1(2,0w∑)1(1z)1(2z)2(1,0w)2(1,1w)2(1,2wHidden layer Output layerInput layer1),()( wxx ff=regressionclassificationoption1xdx2x6CS 2750 Machine LearningMultilayer neural network• Non-linearities are modeled using multiple hidden logistic regression units (organized in layers)• Output layer determines whether it is a regression and binary classification problem),|1()( wxx ==ypfHidden layersOutput layerInput layer),()( wxx ff=regressionclassificationoption1xdx2xCS 2750 Machine LearningLearning with MLP• How to learn the parameters of the neural network?• Gradient descent algorithm. • On-line version: Weight updates are based on• We need to compute gradients for weights in all units• Can be computed in one backward sweep through the net !!!• The process is called back-propagation),(onlinewijjjDJwww∂∂−←α),(onlinewiDJ7CS 2750 Machine LearningBackpropagation∑)1( +kxl)(kxi)1(,−kwji)(kzi)1(+kzl)1(,+kwil∑)1( −kxjk-th level(k+1)-th level(k-1)-th level)(kxi- output of the unit i on level k)(kzi- input to the sigmoid function on level k ∑−+=jjjiiikxkwkwkz )1()()()(,0,))(()( kzgkxii=)(,kwji- weight between units j and i on levels (k-1) and kCS 2750 Machine LearningBackpropagation)(kiδ))(()( wx,fyKi−−=δ)(,kwjiUpdate weight using a data point),()()()(,,,wuonlinejijijiDJkwkwkw∂∂−←α),()()( wuonlineiiDJkzk∂∂=δLetThen:)1()()()()(),(),()(,,−=∂∂∂∂=∂∂kxkkwkzkzDJDJkwjijiiiuonlineuonlinejiδwwS.t. is computed from and the next layer)1(+klδ))(1)(()1()1()(,kxkxkwkkiiillli−++=∑δδLast unit (is the same as for the regular linear units):It is the same for the classification with the log-likelihoodmeasure of fit and linear regression with least-squares error!!!>=< yDu,x)(kxi8CS 2750 Machine LearningLearning with MLP• Online gradient descent algorithm– Weight update:),()()()(online,,,wujijijiDJkwkwkw∂∂−←α)1()()()()(),(),()(,,−=∂∂∂∂=∂∂kxkkwkzkzDJDJkwjijiiiuonlineuonlinejiδww)1()()()(,,−−← kxkkwkwjijijiαδ)1( −kxj)(kiδ- j-th output of the (k-1) layer- derivative computed via backpropagationα- a learning rateCS 2750 Machine LearningOnline gradient descent algorithm for MLPOnline-gradient-descent (D, number of iterations)Initialize all weightsfor i=1:1: number of iterationsdo select a data point Du=<x,y> from Dset compute outputs for each unitcompute derivatives via backpropagationupdate all weights (in parallel)end forreturn weights w)(,kwjii/1=α)1()()()(,,−−←kxkkwkwjijijiαδ)(kxj)(kiδ9CS 2750 Machine LearningXor Example. -2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2-1.5-1-0.500.511.52• No linear decision boundaryCS 2750 Machine LearningXor example. Linear unit10CS 2750 Machine LearningXor example. Neural network with 2 hidden unitsCS 2750 Machine LearningXor example. Neural network with 10 hidden units11CS 2750 Machine LearningProblems with learning MLPs• Decision about the number of units must be made in advance• Converges to a local optima• Sensitive to initial set of weights1xdx2xCS 2750 Machine LearningMLP in practice• Optical character recognition – digits 20x20– Automatic sorting of mails– 5 layer network with multiple output functions10 outputs (0,1,…9)…20x20 = 400 inputs5 10 30004 300 12003 1200 500002 784 31361 3136 78400layer Neurons

View Full Document