Pitt CS 2710 - Supervised learning - D359873

Home> Schools> University of Pittsburgh> Computer Science (CS) > CS 2710> Supervised learning

DOC PREVIEW

Pitt CS 2710 - Supervised learning

School name University of Pittsburgh

Course Cs 2710- Foundtns of Artificl Intellgnc

Pages 14

This preview shows page 1-2-3-4-5 out of 14 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CS 2710 Foundations of AICS 2710 Foundations of AILecture 23Milos [email protected] Sennott SquareSupervised learning.Multilayer neural networks.CS 2710 Foundations of AIAnnouncementsHomework 10: • due on Thursday, December 2, 2004Final exam: December 14, 2004 at 1:00pm-3:00pm• Location: Sennott Square 5129• Closed book• Cumulative2CS 2710 Foundations of AILinear unitsLogistic regressionLinear regression∑1)|1( xyp =0w1w2wdwz∑11x0w1w2wdwdx2x)(xf∑=+=djjjxwwf10)(x)(),|1()(10∑=+===djjjxwwgypf wxxjjjxfyww ))(( x−+←αjjjxfyww ))(( x−+←α))((00xfyww −+←αOn-line gradient update:))((00xfyww−+←αOn-line gradient update:The same=)(xf1xdx2xCS 2710 Foundations of AILimitations of basic linear unitsLogistic regressionLinear regression∑1)|1( xyp =0w1w2wdwz∑10w1w2wdw)(xfFunction linear in inputs !! Linear decision boundary!!∑=+=djjjxwwf10)(x)(),|1()(10∑=+===djjjxwwgypf wxx1xdx2x1xdx2x3CS 2710 Foundations of AIExtensions of simple linear units)()(10xxjmjjwwfφ∑=+=∑)(1xφ)(2xφ)( xmφ11x0w1w2wmwdx)(xjφ- an arbitrary function of x•use feature (basis) functions to model nonlinearities))(()(10xxjmjjwwgfφ∑=+=Linear regression Logistic regressionCS 2710 Foundations of AIRegression with the quadratic model.Limitation: linear hyper-plane onlya non-linear surface can be better4CS 2710 Foundations of AIClassification with the linear model. Logistic regression model defines a linear decision boundary• Example: 2 classes (blue and red points)-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2-1.5-1-0.500.511.52Decision boundaryCS 2710 Foundations of AILinear decision boundary• logistic regression model is not optimal, but not that bad-4 -3 -2 -1 0 1 2 3 4 5 6-4-3-2-10123455CS 2710 Foundations of AIWhen logistic regression fails?• Example in which the logistic regression model fails-4 -3 -2 -1 0 1 2 3 4 5-4-3-2-1012345CS 2710 Foundations of AILimitations of linear units. -2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2-1.5-1-0.500.511.52• Logistic regression does not work for parity functions-no linear decision boundary existsSolution: a model of a non-linear decision boundary6CS 2710 Foundations of AIExample. Regression with polynomials.Regression with polynomials of degree m• Data points: pairs of • Feature functions: m feature functions• Function to learn:imiiimiixwwxwwxf∑∑==+=+=1010)(),(φwiixx =)(φ><yx,mi ,,2,1 K=∑x=)(1xφ22)( x=xφmmx=)(xφ1x0w1w2wmwCS 2710 Foundations of AILearning with extended linear units)()(10xxjmjjwwfφ∑=+=Feature (basis) functions model nonlinearitiesImportant property:• The problem of learning the weights is the same as it was for the linear units• Trick: we have changed the inputs – but the weights are stilllinear in the new input∑)(1xφ)(2xφ)( xmφ11x0w1w2wmwdx))(()(10xxjmjjwwgfφ∑=+=Linear regression Logistic regression7CS 2710 Foundations of AIFunction to learn:On line gradient update for the <x,y> pairGradient updates are of the same form as in the linear and logistic regression models)()),(( xwxjjjfywwφα−+=)),((00wxfyww−+=αLearning with feature functions.)(),(10xwwxfikiiφ∑=+=wCS 2710 Foundations of AIExample: Regression with polynomials of degree m• On line update for <x,y> pairjjjxfyww )),(( wx−+=α)),((00wxfyww−+=αExample. Regression with polynomials.imiiimiixwwxwwxf∑∑==+=+=1010)(),(φw8CS 2710 Foundations of AIMulti-layered neural networks• Alternative way to introduce nonlinearities to regression/classification models• Idea: Cascade several simple neural models with logistic units. Much like neuron connections.CS 2710 Foundations of AIMultilayer neural networkHidden layer Output layerInput layerCascades multiple logistic regression unitsAlso called a multilayer perceptron (MLP)∑11x)|1( xyp =)1(1,0w)1(1,kw)1(2,kwdx2x)2(1z)1(2,0w∑)1(1z)1(2z1)2(1,0w)2(1,1w)2(1,2wExample: (2 layer) classifier with non-linear decision boundaries9CS 2710 Foundations of AIMultilayer neural network• Models non-linearities through logistic regression units• Can be applied to both regression and binary classificationproblems ∑1),|1()( wxx ==ypf)1(1,0w)1(1,kw)1(2,kw)2(1z)1(2,0w∑)1(1z)1(2z)2(1,0w)2(1,1w)2(1,2wHidden layer Output layerInput layer1),()( wxx ff=regressionclassificationoption1xdx2xCS 2710 Foundations of AIMultilayer neural network• Non-linearities are modeled using multiple hidden logistic regression units (organized in layers)• The output layer determines whether it is a regression or a binary classification problem),|1()( wxx ==ypfHidden layersOutput layerInput layer),()( wxx ff=regressionclassificationoption1xdx2x10CS 2710 Foundations of AILearning with MLP• How to learn the parameters of the neural network?• Online gradient descent algorithm– Weight updates based on online error:• We need to compute gradients for weights in all units• Can be computed in one backward sweep through the net !!!• The process is called back-propagation),(onlinewijjjDJwww∂∂−←α),(onlinewiDJCS 2710 Foundations of AIBackpropagation∑)1( +kxl)(kxi)(,kwji)(kzi)1(+kzl)1(,+kwil∑)1( −kxjk-th level(k+1)-th level(k-1)-th level)(kxi- output of the unit i on level k)(kzi- input to the sigmoid function on level k ∑−+=jjjiiikxkwkwkz )1()()()(,0,))(()( kzgkxii=)(,kwji- weight between units j and i on levels (k-1) and k11CS 2710 Foundations of AIBackpropagation)(kiδ))(()( wx,fyKi−−=δ)(,kwjiUpdate weight using a data point),()()()(,,,wuonlinejijijiDJkwkwkw∂∂−←α),()()( wuonlineiiDJkzk∂∂=δLetThen:)1()()()()(),(),()(,,−=∂∂∂∂=∂∂kxkkwkzkzDJDJkwjijiiiuonlineuonlinejiδwwS.t. is computed from and the next layer)1(+klδ))(1)(()1()1()(,kxkxkwkkiiillli−++=∑δδLast unit (is the same as for the regular linear units):It is the same for the classification with the log-likelihoodmeasure of fit and linear regression with least-squares error!!!>=< yDu,x)(kxiCS 2710 Foundations of AILearning with MLP• Online gradient descent algorithm– Weight update:),()()()(online,,,wujijijiDJkwkwkw∂∂−←α)1()()()()(),(),()(,,−=∂∂∂∂=∂∂kxkkwkzkzDJDJkwjijiiiuonlineuonlinejiδww)1()()()(,,−−← kxkkwkwjijijiαδ)1( −kxj)(kiδ- j-th output of the (k-1) layer- derivative computed via backpropagationα- a learning rate12CS 2710 Foundations of AIOnline gradient descent algorithm for MLPOnline-gradient-descent (D, number of iterations)Initialize all weightsfor i=1:1: number of iterationsdo select a data point

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 14 pages.

Pitt CS 2710 - Supervised learning

Sign up for free to view:

Please select your school