Pitt CS 2750 - Multi layer neural networks - D2395902

Home> Schools> University of Pittsburgh> Computer Science (CS) > CS 2750> Multi layer neural networks

DOC PREVIEW

Pitt CS 2750 - Multi layer neural networks

School name University of Pittsburgh

Course Cs 2750- Machine Learning

Pages 11

This preview shows page 1-2-3-4 out of 11 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CS 1571 Intro to AICS 1571 Introduction to AILecture 10Milos [email protected] Sennott SquareMulti-layer neural networksCS 1571 Intro to AILinear unitsLogistic regressionLinear regressiondx∑11x0w1w2wdw2x)(xfxwxTf =)()(),|1()( xwwxxTgypf ===xxww ))(( fy −+←αGradient update:Gradient update:The samexxww ))(( fy −+←α∑1)|1( xyp =0w1w2wdwz=)(xf1xdx2x∑=−+←niiiify1))(( xxwwα∑=−+←niiiify1))(( xxwwαOnline:Online:2CS 1571 Intro to AILimitations of basic linear unitsLogistic regressionLinear regression∑1)|1( xyp =0w1w2wdwz∑10w1w2wdw)(xfFunction linear in inputs !! Linear decision boundary!!∑=+=djjjxwwf10)(x)(),|1()(10∑=+===djjjxwwgypf wxx1xdx2x1xdx2xCS 1571 Intro to AIExtensions of simple linear units)()(10xxjmjjwwfφ∑=+=∑)(1xφ)(2xφ)( xmφ11x0w1w2wmwdx)(xjφ- an arbitrary function of x•use feature (basis) functions to model nonlinearities))(()(10xxjmjjwwgfφ∑=+=Linear regression Logistic regression3CS 1571 Intro to AIRegression with a quadratic model.CS 1571 Intro to AIQuadratic decision boundary-4 -3 -2 -1 0 1 2 3 4 5 6-4-3-2-10123454CS 1571 Intro to AIMulti-layered neural networks• Offer an alternative way to introduce nonlinearities to regression/classification models• Idea: Cascade several simple logistic regression units.• Motivation: from a neuron and synaptic connections.CS 1571 Intro to AIModel of a neuron∑0w1w2wkwzx1yThreshold function5CS 1571 Intro to AIMultilayer neural networkHidden layer Output layerInput layerCascades multiple logistic regression unitsAlso called a multilayer perceptron (MLP)∑11x)|1( xyp =)1(1,0w)1(1,kw)1(2,kwdx2x)2(1z)1(2,0w∑)1(1z)1(2z1)2(1,0w)2(1,1w)2(1,2wExample: a (2 layer) classifier with non-linear decision boundariesCS 1571 Intro to AIMultilayer neural network• Models non-linearities through logistic regression units• Can be applied to both regression and binary classificationproblems ∑1),|1()( wxx == ypf)1(1,0w)1(1,kw)1(2,kw)2(1z)1(2,0w∑)1(1z)1(2z)2(1,0w)2(1,1w)2(1,2wHidden layer Output layerInput layer1),()( wxx ff =regressionclassificationoption1xdx2x6CS 1571 Intro to AIMultilayer neural network• Non-linearities are modeled using multiple hidden logistic regression units (organized in layers)• Output layer determines whether it is a regression and binary classification problem),|1()( wxx == ypfHidden layersOutput layerInput layer),()( wxx ff =regressionclassificationoption1xdx2xCS 1571 Intro to AILearning with MLP• How to learn the parameters of the neural network?• Gradient descent algorithm. • On-line version: Weight updates are based on• We need to compute gradients for weights in all units• Can be computed in one backward sweep through the net !!!• The process is called back-propagation),(onlinewijjjDJwww∂∂−←α),(onlinewiDJ7CS 1571 Intro to AIBackpropagation∑)1( +kxl)(kxi)1(,−kwji)(kzi)1( +kzl)1(,+kwil∑)1( −kxjk-th level(k+1)-th level(k-1)-th level)(kxi- output of the unit i on level k)(kzi- input to the sigmoid function on level k ∑−+=jjjiiikxkwkwkz )1()()()(,0,))(()( kzgkxii=)(,kwji- weight between units j and i on levels (k-1) and kCS 1571 Intro to AIBackpropagation)(kiδ))(()( wx,fyKi−−=δ)(,kwjiUpdate weight using a data point),()()()(,,,wuonlinejijijiDJkwkwkw∂∂−←α),()()( wuonlineiiDJkzk∂∂=δLetThen:)1()()()()(),(),()(,,−=∂∂∂∂=∂∂kxkkwkzkzDJDJkwjijiiiuonlineuonlinejiδwwS.t. is computed from and the next layer)1( +klδ))(1)(()1()1()(,kxkxkwkkiiillli−++=∑δδLast unit (is the same as for the regular linear units):It is the same for the classification with the log-likelihoodmeasure of fit and linear regression with least-squares error!!!>=< yDu,x)(kxi8CS 1571 Intro to AILearning with MLP• Online gradient descent algorithm– Weight update:),()()()(online,,,wujijijiDJkwkwkw∂∂−←α)1()()()()(),(),()(,,−=∂∂∂∂=∂∂kxkkwkzkzDJDJkwjijiiiuonlineuonlinejiδww)1()()()(,,−−← kxkkwkwjijijiαδ)1( −kxj)(kiδ- j-th output of the (k-1) layer- derivative computed via backpropagationα- a learning rateCS 1571 Intro to AIOnline gradient descent algorithm for MLPOnline-gradient-descent (D, number of iterations)Initialize all weightsfor i=1:1: number of iterationsdo select a data point Du=<x,y> from Dset compute outputs for each unitcompute derivatives via backpropagationupdate all weights (in parallel)end forreturn weights w)(,kwjii/1=α)1()()()(,,−−← kxkkwkwjijijiαδ)(kxj)(kiδ9CS 1571 Intro to AIXor Example. -2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2-1.5-1-0.500.511.52• No linear decision boundaryCS 1571 Intro to AIXor example. Linear unit10CS 1571 Intro to AIXor example. Neural network with 2 hidden unitsCS 1571 Intro to AIXor example. Neural network with 10 hidden units11CS 1571 Intro to AIProblems with learning MLPs• Decision about the number of units must be made in advance• Converges to local optima• Sensitive to initial set of weights1xdx2xCS 1571 Intro to AIMLP in practice• Optical character recognition – digits 20x20– Automatic sorting of mails– 5 layer network with multiple output functions10 outputs (0,1,…9)…20x20 = 400 inputs5 10 30004 300 12003 1200 500002 784 31361 3136 78400layer Neurons

View Full Document