Pitt CS 2710 - Learning - D2728840

Home> Schools> University of Pittsburgh> Computer Science (CS) > CS 2710> Learning

DOC PREVIEW

Pitt CS 2710 - Learning

School name University of Pittsburgh

Course Cs 2710- Foundtns of Artificl Intellgnc

Pages 24

This preview shows page 1-2-23-24 out of 24 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CS 2710 Foundations of AICS 2710 Foundations of AILecture 21Milos [email protected] Sennott SquareLearningCS 2710 Foundations of AITypes of learning• Supervised learning– Learning mapping between inputs x and desired outputs y– Teacher gives me y’s for the learning purposes• Unsupervised learning– Learning relations between data components– No specific outputs given by a teacher• Reinforcement learning– Learning mapping between inputs x and desired outputs y– Critic does not give me y’s but instead a signal (reinforcement) of how good my answer was• Other types of learning:– Concept learning, explanation-based learning, etc.2CS 2710 Foundations of AISupervised learningData: a set of n examples is input vector, and y is desired output (given by a teacher)Objective: learn the mapping s.t.Two types of problems:• Regression: X discrete or continuousY is continuous• Classification: X discrete or continuousY is discrete},..,,{21 ndddD =>=<iiiyd ,xixYXf →:nixfyii,..,1allfor)(=≈CS 2710 Foundations of AISupervised learning examples• Regression: Y is continuousDebt/equityEarnings company stock priceFuture product orders• Classification: Y is discreteHandwritten digit (array of 0,1s)Label “3”3CS 2710 Foundations of AIUnsupervised learning• Data:vector of valuesNo target value (output) y •Objective:– learn relations between samples, components of samplesTypes of problems:• ClusteringGroup together “similar” examples, e.g. patient cases•Density estimation– Model probabilistically the population of samples, e.g. relations between the diseases, symptoms, lab tests etc.},..,,{21 ndddD =iid x=CS 2710 Foundations of AIUnsupervised learning example. • Density estimation. We want to build the probability model of a population from which we draw samples -2 -1.5 -1 -0.5 0 0.5 1 1.5 2-1-0.500.511.522.53iid x=4CS 2710 Foundations of AIUnsupervised learning. Density estimation• A probability density of a point in the two dimensional space– Model used here: Mixture of GaussiansCS 2710 Foundations of AIReinforcement learning• We want to learn:• We see samples of x but not y • Instead of y we get a feedback (reinforcement) from a criticabout how good our output was • The goal is to select output that leads to the best reinforcementLearnerinput sampleoutputCriticreinforcementYXf →:5CS 2710 Foundations of AILearning• Assume we see examples of pairs (x , y) and we want to learn the mapping to predict future ys for values of x• We get the data what should we do?YXf →:-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2-10-8-6-4-20246810xyCS 2710 Foundations of AILearning bias• Problem: many possible functions exists for representing the mapping between x and y • Which one to choose? Many examples still unseen!YXf →:-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2-10-8-6-4-20246810xy6CS 2710 Foundations of AILearning bias• Problem is easier when we make an assumption about the model, say,• Restriction to a linear model is an example of the learning biasbaxxf+=)(-2.5 -2 -1. 5 -1 -0. 5 0 0.5 1 1.5 2-10-8-6-4-20246810xyCS 2710 Foundations of AILearning bias• Bias provides the learner with some basis for choosing among possible representations of the function.•Forms of bias: constraints, restrictions, model preferences•Important: There is no learning without a bias!-2.5 -2 -1. 5 -1 -0. 5 0 0.5 1 1.5 2-10-8-6-4-20246810xy7CS 2710 Foundations of AILearning bias• Choosing a parametric model or a set of models is not enough Still too many functions– One for every pair of parameters a, bbaxxf+=)(-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1. 5 2-10-8-6-4-20246810xyCS 2710 Foundations of AIFitting the data to the modelWe are interested in finding the best set of model parametersHow is the best set defined? Our goal is to have the parameters that:• reduce the misfit between the model and data• Or, (in other words) that explain the data the best Error function:Gives a measure of misfit between the data and the model• Examples of error functions:– Mean square error– Misclassification error21))((1iinixfyn−∑=Average # of misclassified cases )(iixfy≠8CS 2710 Foundations of AIFitting the data to the model• Linear regression – Least squares fit with the linear model – minimizes-2.5 -2 -1 .5 -1 -0 .5 0 0.5 1 1.5 2-10-8-6-4-20246810xy21))((1iinixfyn−∑=CS 2710 Foundations of AITypical learningThree basic steps:• Select a model or a set of models (with parameters)E.g.•Select the error function to be optimizedE.g.•Find the set of parameters optimizing the error function– The model and parameters with the smallest error represent the best fit of the model to the dataBut there are problems one must be careful about …baxy +=21))((1iinixfyn−∑=9CS 2710 Foundations of AILearningProblem• We fit the model based on past experience (past examples seen)• But ultimately we are interested in learning the mapping that performs well on the whole population of examplesTraining data: Data used to fit the parameters of the modelTraining error:True (generalization) error (over the whole and not completely known population):The training error tries to approximate the true error.But does a good training error always imply a good generalization error?21))((1iinixfyn−∑=2),())(( xfyEyx−Expected squared errorCS 2710 Foundations of AIOverfitting• Assume we have a set of 10 points and we consider polynomial functions as our possible models-2 -1. 5 -1 -0.5 0 0.5 1 1.5 2-8-6-4-2024681010CS 2710 Foundations of AIOverfitting• Fitting a linear function with mean-squares error• Error is nonzero-2 -1. 5 -1 -0.5 0 0.5 1 1. 5 2-8-6-4-2024681012CS 2710 Foundations of AIOverfitting• Linear vs. cubic polynomial• Higher order polynomial leads to a better fit, smaller error -2 -1. 5 -1 -0.5 0 0.5 1 1. 5 2-8-6-4-202468101211CS 2710 Foundations of AIOverfitting• Is it always good to minimize the error of the observed data?-2 -1. 5 -1 -0.5 0 0.5 1 1.5 2-8-6-4-2024681012CS 2710 Foundations of AIOverfitting• For 10 data points, degree 9 polynomial gives a perfect fit (Lagrange interpolation). Error is zero.• Is it always good to minimize the training error? -1.5 -1 -0. 5 0 0.5 1 1.5-8-6-4-2024681012CS 2710 Foundations of AIOverfitting• For 10 data points, degree 9 polynomial gives a perfect fit (Lagrange

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-23-24 out of 24 pages.

Pitt CS 2710 - Learning

Sign up for free to view:

Please select your school