DOC PREVIEW
Pitt CS 2710 - Learning

This preview shows page 1-2-23-24 out of 24 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 2710 Foundations of AICS 2710 Foundations of AILecture 21Milos [email protected] Sennott SquareLearningCS 2710 Foundations of AITypes of learning• Supervised learning– Learning mapping between inputs x and desired outputs y– Teacher gives me y’s for the learning purposes• Unsupervised learning– Learning relations between data components– No specific outputs given by a teacher• Reinforcement learning– Learning mapping between inputs x and desired outputs y– Critic does not give me y’s but instead a signal (reinforcement) of how good my answer was• Other types of learning:– Concept learning, explanation-based learning, etc.2CS 2710 Foundations of AISupervised learningData: a set of n examples is input vector, and y is desired output (given by a teacher)Objective: learn the mapping s.t.Two types of problems:• Regression: X discrete or continuousY is continuous• Classification: X discrete or continuousY is discrete},..,,{21 ndddD =>=<iiiyd ,xixYXf →:nixfyii,..,1allfor)(=≈CS 2710 Foundations of AISupervised learning examples• Regression: Y is continuousDebt/equityEarnings company stock priceFuture product orders• Classification: Y is discreteHandwritten digit (array of 0,1s)Label “3”3CS 2710 Foundations of AIUnsupervised learning• Data:vector of valuesNo target value (output) y •Objective:– learn relations between samples, components of samplesTypes of problems:• ClusteringGroup together “similar” examples, e.g. patient cases•Density estimation– Model probabilistically the population of samples, e.g. relations between the diseases, symptoms, lab tests etc.},..,,{21 ndddD =iid x=CS 2710 Foundations of AIUnsupervised learning example. • Density estimation. We want to build the probability model of a population from which we draw samples -2 -1.5 -1 -0.5 0 0.5 1 1.5 2-1-0.500.511.522.53iid x=4CS 2710 Foundations of AIUnsupervised learning. Density estimation• A probability density of a point in the two dimensional space– Model used here: Mixture of GaussiansCS 2710 Foundations of AIReinforcement learning• We want to learn:• We see samples of x but not y • Instead of y we get a feedback (reinforcement) from a criticabout how good our output was • The goal is to select output that leads to the best reinforcementLearnerinput sampleoutputCriticreinforcementYXf →:5CS 2710 Foundations of AILearning• Assume we see examples of pairs (x , y) and we want to learn the mapping to predict future ys for values of x• We get the data what should we do?YXf →:-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2-10-8-6-4-20246810xyCS 2710 Foundations of AILearning bias• Problem: many possible functions exists for representing the mapping between x and y • Which one to choose? Many examples still unseen!YXf →:-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2-10-8-6-4-20246810xy6CS 2710 Foundations of AILearning bias• Problem is easier when we make an assumption about the model, say,• Restriction to a linear model is an example of the learning biasbaxxf+=)(-2.5 -2 -1. 5 -1 -0. 5 0 0.5 1 1.5 2-10-8-6-4-20246810xyCS 2710 Foundations of AILearning bias• Bias provides the learner with some basis for choosing among possible representations of the function.•Forms of bias: constraints, restrictions, model preferences•Important: There is no learning without a bias!-2.5 -2 -1. 5 -1 -0. 5 0 0.5 1 1.5 2-10-8-6-4-20246810xy7CS 2710 Foundations of AILearning bias• Choosing a parametric model or a set of models is not enough Still too many functions– One for every pair of parameters a, bbaxxf+=)(-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1. 5 2-10-8-6-4-20246810xyCS 2710 Foundations of AIFitting the data to the modelWe are interested in finding the best set of model parametersHow is the best set defined? Our goal is to have the parameters that:• reduce the misfit between the model and data• Or, (in other words) that explain the data the best Error function:Gives a measure of misfit between the data and the model• Examples of error functions:– Mean square error– Misclassification error21))((1iinixfyn−∑=Average # of misclassified cases )(iixfy≠8CS 2710 Foundations of AIFitting the data to the model• Linear regression – Least squares fit with the linear model – minimizes-2.5 -2 -1 .5 -1 -0 .5 0 0.5 1 1.5 2-10-8-6-4-20246810xy21))((1iinixfyn−∑=CS 2710 Foundations of AITypical learningThree basic steps:• Select a model or a set of models (with parameters)E.g.•Select the error function to be optimizedE.g.•Find the set of parameters optimizing the error function– The model and parameters with the smallest error represent the best fit of the model to the dataBut there are problems one must be careful about …baxy +=21))((1iinixfyn−∑=9CS 2710 Foundations of AILearningProblem• We fit the model based on past experience (past examples seen)• But ultimately we are interested in learning the mapping that performs well on the whole population of examplesTraining data: Data used to fit the parameters of the modelTraining error:True (generalization) error (over the whole and not completely known population):The training error tries to approximate the true error.But does a good training error always imply a good generalization error?21))((1iinixfyn−∑=2),())(( xfyEyx−Expected squared errorCS 2710 Foundations of AIOverfitting• Assume we have a set of 10 points and we consider polynomial functions as our possible models-2 -1. 5 -1 -0.5 0 0.5 1 1.5 2-8-6-4-2024681010CS 2710 Foundations of AIOverfitting• Fitting a linear function with mean-squares error• Error is nonzero-2 -1. 5 -1 -0.5 0 0.5 1 1. 5 2-8-6-4-2024681012CS 2710 Foundations of AIOverfitting• Linear vs. cubic polynomial• Higher order polynomial leads to a better fit, smaller error -2 -1. 5 -1 -0.5 0 0.5 1 1. 5 2-8-6-4-202468101211CS 2710 Foundations of AIOverfitting• Is it always good to minimize the error of the observed data?-2 -1. 5 -1 -0.5 0 0.5 1 1.5 2-8-6-4-2024681012CS 2710 Foundations of AIOverfitting• For 10 data points, degree 9 polynomial gives a perfect fit (Lagrange interpolation). Error is zero.• Is it always good to minimize the training error? -1.5 -1 -0. 5 0 0.5 1 1.5-8-6-4-2024681012CS 2710 Foundations of AIOverfitting• For 10 data points, degree 9 polynomial gives a perfect fit (Lagrange


View Full Document

Pitt CS 2710 - Learning

Documents in this Course
Planning

Planning

25 pages

Lecture

Lecture

12 pages

Load more
Download Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?