1CS 2710 Foundations of AICS 2710 Foundations of AILecture 23Milos [email protected] Sennott SquareSupervised learning.Multilayer neural networks.CS 2710 Foundations of AIAnnouncementsHomework 10: • due on Thursday, December 2, 2004Final exam: December 14, 2004 at 1:00pm-3:00pm• Location: Sennott Square 5129• Closed book• Cumulative2CS 2710 Foundations of AILinear unitsLogistic regressionLinear regression∑1)|1( xyp =0w1w2wdwz∑11x0w1w2wdwdx2x)(xf∑=+=djjjxwwf10)(x)(),|1()(10∑=+===djjjxwwgypf wxxjjjxfyww ))(( x−+←αjjjxfyww ))(( x−+←α))((00xfyww −+←αOn-line gradient update:))((00xfyww−+←αOn-line gradient update:The same=)(xf1xdx2xCS 2710 Foundations of AILimitations of basic linear unitsLogistic regressionLinear regression∑1)|1( xyp =0w1w2wdwz∑10w1w2wdw)(xfFunction linear in inputs !! Linear decision boundary!!∑=+=djjjxwwf10)(x)(),|1()(10∑=+===djjjxwwgypf wxx1xdx2x1xdx2x3CS 2710 Foundations of AIExtensions of simple linear units)()(10xxjmjjwwfφ∑=+=∑)(1xφ)(2xφ)( xmφ11x0w1w2wmwdx)(xjφ- an arbitrary function of x•use feature (basis) functions to model nonlinearities))(()(10xxjmjjwwgfφ∑=+=Linear regression Logistic regressionCS 2710 Foundations of AIRegression with the quadratic model.Limitation: linear hyper-plane onlya non-linear surface can be better4CS 2710 Foundations of AIClassification with the linear model. Logistic regression model defines a linear decision boundary• Example: 2 classes (blue and red points)-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2-1.5-1-0.500.511.52Decision boundaryCS 2710 Foundations of AILinear decision boundary• logistic regression model is not optimal, but not that bad-4 -3 -2 -1 0 1 2 3 4 5 6-4-3-2-10123455CS 2710 Foundations of AIWhen logistic regression fails?• Example in which the logistic regression model fails-4 -3 -2 -1 0 1 2 3 4 5-4-3-2-1012345CS 2710 Foundations of AILimitations of linear units. -2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2-1.5-1-0.500.511.52• Logistic regression does not work for parity functions-no linear decision boundary existsSolution: a model of a non-linear decision boundary6CS 2710 Foundations of AIExample. Regression with polynomials.Regression with polynomials of degree m• Data points: pairs of • Feature functions: m feature functions• Function to learn:imiiimiixwwxwwxf∑∑==+=+=1010)(),(φwiixx =)(φ><yx,mi ,,2,1 K=∑x=)(1xφ22)( x=xφmmx=)(xφ1x0w1w2wmwCS 2710 Foundations of AILearning with extended linear units)()(10xxjmjjwwfφ∑=+=Feature (basis) functions model nonlinearitiesImportant property:• The problem of learning the weights is the same as it was for the linear units• Trick: we have changed the inputs – but the weights are stilllinear in the new input∑)(1xφ)(2xφ)( xmφ11x0w1w2wmwdx))(()(10xxjmjjwwgfφ∑=+=Linear regression Logistic regression7CS 2710 Foundations of AIFunction to learn:On line gradient update for the <x,y> pairGradient updates are of the same form as in the linear and logistic regression models)()),(( xwxjjjfywwφα−+=)),((00wxfyww−+=αLearning with feature functions.)(),(10xwwxfikiiφ∑=+=wCS 2710 Foundations of AIExample: Regression with polynomials of degree m• On line update for <x,y> pairjjjxfyww )),(( wx−+=α)),((00wxfyww−+=αExample. Regression with polynomials.imiiimiixwwxwwxf∑∑==+=+=1010)(),(φw8CS 2710 Foundations of AIMulti-layered neural networks• Alternative way to introduce nonlinearities to regression/classification models• Idea: Cascade several simple neural models with logistic units. Much like neuron connections.CS 2710 Foundations of AIMultilayer neural networkHidden layer Output layerInput layerCascades multiple logistic regression unitsAlso called a multilayer perceptron (MLP)∑11x)|1( xyp =)1(1,0w)1(1,kw)1(2,kwdx2x)2(1z)1(2,0w∑)1(1z)1(2z1)2(1,0w)2(1,1w)2(1,2wExample: (2 layer) classifier with non-linear decision boundaries9CS 2710 Foundations of AIMultilayer neural network• Models non-linearities through logistic regression units• Can be applied to both regression and binary classificationproblems ∑1),|1()( wxx ==ypf)1(1,0w)1(1,kw)1(2,kw)2(1z)1(2,0w∑)1(1z)1(2z)2(1,0w)2(1,1w)2(1,2wHidden layer Output layerInput layer1),()( wxx ff=regressionclassificationoption1xdx2xCS 2710 Foundations of AIMultilayer neural network• Non-linearities are modeled using multiple hidden logistic regression units (organized in layers)• The output layer determines whether it is a regression or a binary classification problem),|1()( wxx ==ypfHidden layersOutput layerInput layer),()( wxx ff=regressionclassificationoption1xdx2x10CS 2710 Foundations of AILearning with MLP• How to learn the parameters of the neural network?• Online gradient descent algorithm– Weight updates based on online error:• We need to compute gradients for weights in all units• Can be computed in one backward sweep through the net !!!• The process is called back-propagation),(onlinewijjjDJwww∂∂−←α),(onlinewiDJCS 2710 Foundations of AIBackpropagation∑)1( +kxl)(kxi)(,kwji)(kzi)1(+kzl)1(,+kwil∑)1( −kxjk-th level(k+1)-th level(k-1)-th level)(kxi- output of the unit i on level k)(kzi- input to the sigmoid function on level k ∑−+=jjjiiikxkwkwkz )1()()()(,0,))(()( kzgkxii=)(,kwji- weight between units j and i on levels (k-1) and k11CS 2710 Foundations of AIBackpropagation)(kiδ))(()( wx,fyKi−−=δ)(,kwjiUpdate weight using a data point),()()()(,,,wuonlinejijijiDJkwkwkw∂∂−←α),()()( wuonlineiiDJkzk∂∂=δLetThen:)1()()()()(),(),()(,,−=∂∂∂∂=∂∂kxkkwkzkzDJDJkwjijiiiuonlineuonlinejiδwwS.t. is computed from and the next layer)1(+klδ))(1)(()1()1()(,kxkxkwkkiiillli−++=∑δδLast unit (is the same as for the regular linear units):It is the same for the classification with the log-likelihoodmeasure of fit and linear regression with least-squares error!!!>=< yDu,x)(kxiCS 2710 Foundations of AILearning with MLP• Online gradient descent algorithm– Weight update:),()()()(online,,,wujijijiDJkwkwkw∂∂−←α)1()()()()(),(),()(,,−=∂∂∂∂=∂∂kxkkwkzkzDJDJkwjijiiiuonlineuonlinejiδww)1()()()(,,−−← kxkkwkwjijijiαδ)1( −kxj)(kiδ- j-th output of the (k-1) layer- derivative computed via backpropagationα- a learning rate12CS 2710 Foundations of AIOnline gradient descent algorithm for MLPOnline-gradient-descent (D, number of iterations)Initialize all weightsfor i=1:1: number of iterationsdo select a data point
View Full Document