DOC PREVIEW
Pitt CS 2710 - Supervised learning

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 2710 Foundations of AICS 2710 Foundations of AILecture 23Milos [email protected] Sennott SquareSupervised learning.Multilayer neural networks.CS 2710 Foundations of AIAnnouncementsHomework 10: • due on Wednesday, November 30, 2005Final exam: • December 14, 2004 at 11:00am-1:00pm• Location: TBA• Closed book• Cumulative• AI prelim exam2CS 2710 Foundations of AILinear unitsLogistic regressionLinear regression∑1)|1( xyp =0w1w2wdwz∑11x0w1w2wdwdx2x)(xf∑=+=djjjxwwf10)(x)(),|1()(10∑=+===djjjxwwgypf wxxjjjxfyww ))(( x−+←αjjjxfyww ))(( x−+←α))((00xfyww −+←αOn-line gradient update:))((00xfyww−+←αOn-line gradient update:The same=)(xf1xdx2xCS 2710 Foundations of AILimitations of basic linear unitsLogistic regressionLinear regression∑1)|1( xyp =0w1w2wdwz∑10w1w2wdw)(xfFunction linear in inputs !! Linear decision boundary!!∑=+=djjjxwwf10)(x)(),|1()(10∑=+===djjjxwwgypf wxx1xdx2x1xdx2x3CS 2710 Foundations of AIExtensions of simple linear units)()(10xxjmjjwwfφ∑=+=∑)(1xφ)(2xφ)( xmφ11x0w1w2wmwdx)(xjφ- an arbitrary function of x•use feature (basis) functions to model nonlinearities))(()(10xxjmjjwwgfφ∑=+=Linear regression Logistic regressionCS 2710 Foundations of AIRegression with the quadratic model.Limitation: linear hyper-plane onlya non-linear surface can be better4CS 2710 Foundations of AIClassification with the linear model. Logistic regression model defines a linear decision boundary• Example: 2 classes (blue and red points)-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2-1.5-1-0.500.511.52Decision boundaryCS 2710 Foundations of AILinear decision boundary• logistic regression model is not optimal, but not that bad-4 -3 -2 -1 0 1 2 3 4 5 6-4-3-2-10123455CS 2710 Foundations of AIWhen logistic regression fails?• Example in which the logistic regression model fails-4 -3 -2 -1 0 1 2 3 4 5-4-3-2-1012345CS 2710 Foundations of AILimitations of linear units. -2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2-1.5-1-0.500.511.52• Logistic regression does not work for parity functions-no linear decision boundary existsSolution: a model of a non-linear decision boundary6CS 2710 Foundations of AIExample. Regression with polynomials.Regression with polynomials of degree m• Data points: pairs of • Feature functions: m feature functions• Function to learn:imiiimiixwwxwwxf∑∑==+=+=1010)(),(φwiixx =)(φ><yx,mi ,,2,1 K=∑x=)(1xφ22)( x=xφmmx=)(xφ1x0w1w2wmwCS 2710 Foundations of AILearning with extended linear units)()(10xxjmjjwwfφ∑=+=Feature (basis) functions model nonlinearitiesImportant property:• The problem of learning the weights is the same as it was for the linear units• Trick: we have changed the inputs – but the weights are stilllinear in the new input∑)(1xφ)(2xφ)( xmφ11x0w1w2wmwdx))(()(10xxjmjjwwgfφ∑=+=Linear regression Logistic regression7CS 2710 Foundations of AIFunction to learn:On line gradient update for the <x,y> pairGradient updates are of the same form as in the linear and logistic regression models)()),(( xwxjjjfywwφα−+=)),((00wxfyww−+=αLearning with feature functions.)(),(10xwwxfikiiφ∑=+=wCS 2710 Foundations of AIExample: Regression with polynomials of degree m• On line update for <x,y> pairjjjxfyww )),(( wx−+=α)),((00wxfyww−+=αExample. Regression with polynomials.imiiimiixwwxwwxf∑∑==+=+=1010)(),(φw8CS 2710 Foundations of AIMulti-layered neural networks• Alternative way to introduce nonlinearities to regression/classification models• Idea: Cascade several simple neural models with logistic units. Much like neuron connections.CS 2710 Foundations of AIMultilayer neural networkHidden layer Output layerInput layerCascades multiple logistic regression unitsAlso called a multilayer perceptron (MLP)∑11x)|1( xyp =)1(1,0w)1(1,kw)1(2,kwdx2x)2(1z)1(2,0w∑)1(1z)1(2z1)2(1,0w)2(1,1w)2(1,2wExample: (2 layer) classifier with non-linear decision boundaries9CS 2710 Foundations of AIMultilayer neural network• Models non-linearities through logistic regression units• Can be applied to both regression and binary classificationproblems ∑1),|1()( wxx ==ypf)1(1,0w)1(1,kw)1(2,kw)2(1z)1(2,0w∑)1(1z)1(2z)2(1,0w)2(1,1w)2(1,2wHidden layer Output layerInput layer1),()( wxx ff=regressionclassificationoption1xdx2xCS 2710 Foundations of AIMultilayer neural network• Non-linearities are modeled using multiple hidden logistic regression units (organized in layers)• The output layer determines whether it is a regression or a binary classification problem),|1()( wxx ==ypfHidden layersOutput layerInput layer),()( wxx ff=regressionclassificationoption1xdx2x10CS 2710 Foundations of AILearning with MLP• How to learn the parameters of the neural network?• Online gradient descent algorithm– Weight updates based on online error:• We need to compute gradients for weights in all units• Can be computed in one backward sweep through the net !!!• The process is called back-propagation),(onlinewijjjDJwww∂∂−←α),(onlinewiDJCS 2710 Foundations of AIBackpropagation∑)1( +kxl)(kxi)(,kwji)(kzi)1(+kzl)1(,+kwil∑)1( −kxjk-th level(k+1)-th level(k-1)-th level)(kxi- output of the unit i on level k)(kzi- input to the sigmoid function on level k ∑−+=jjjiiikxkwkwkz )1()()()(,0,))(()( kzgkxii=)(,kwji- weight between units j and i on levels (k-1) and k11CS 2710 Foundations of AIBackpropagation)(kiδ))(()( wx,fyKi−−=δ)(,kwjiUpdate weight using a data point),()()()(,,,wuonlinejijijiDJkwkwkw∂∂−←α),()()( wuonlineiiDJkzk∂∂=δLetThen:)1()()()()(),(),()(,,−=∂∂∂∂=∂∂kxkkwkzkzDJDJkwjijiiiuonlineuonlinejiδwwS.t. is computed from and the next layer)1(+klδ))(1)(()1()1()(,kxkxkwkkiiillli−++=∑δδLast unit (is the same as for the regular linear units):It is the same for the classification with the log-likelihoodmeasure of fit and linear regression with least-squares error!!!>=< yDu,x)(kxiCS 2710 Foundations of AILearning with MLP• Online gradient descent algorithm– Weight update:),()()()(online,,,wujijijiDJkwkwkw∂∂−←α)1()()()()(),(),()(,,−=∂∂∂∂=∂∂kxkkwkzkzDJDJkwjijiiiuonlineuonlinejiδww)1()()()(,,−−← kxkkwkwjijijiαδ)1( −kxj)(kiδ- j-th output of the (k-1) layer- derivative computed via backpropagationα- a learning rate12CS 2710 Foundations of AIOnline gradient descent algorithm for MLPOnline-gradient-descent (D, number of iterations)Initialize all weightsfor i=1:1: number of iterationsdo select a data


View Full Document

Pitt CS 2710 - Supervised learning

Documents in this Course
Learning

Learning

24 pages

Planning

Planning

25 pages

Lecture

Lecture

12 pages

Load more
Download Supervised learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Supervised learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Supervised learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?