CS 416 Artificial Intelligence Lecture Lecture 24 24 Statistical Statistical Learning Learning Chapter Chapter 20 20 AI Creating rational agents The The pursuit pursuit of of autonomous autonomous rational rational agents agents It s It s all all about about search search Varying Varying amounts amounts of of model model information information tree tree searching searching informed uninformed informed uninformed simulated simulated annealing annealing value policy value policy iteration iteration Searching Searching for for an an explanation explanation of of observations observations Used Used to to develop develop aa model model Searching for explanation of observations IfIf II can can explain explain observations observations can can II predict predict the the future future Can Can II explain explain why why ten ten coin coin tosses tosses are are 66 H H and and 44 T T Can Can II predict predict the the 11 11thth coin coin toss toss Running example Candy Surprise Surprise Candy Candy Comes Comes in in two two flavors flavors cherry cherry yum yum lime lime yuk yuk All All candy candy is is wrapped wrapped in in same same opaque opaque wrapper wrapper Candy Candy is is packaged packaged in in large large bags bags containing containing five five different different allocations allocations of of cherry cherry and and lime lime Statistics Given Given aa bag bag of of candy candy what what distribution distribution of of flavors flavors will will itit have have Let Let H H be be the the random random variable variable corresponding corresponding to to your your hypothesis hypothesis H H11 all all cherry cherry H H22 all all lime lime H H33 50 50 50 50 cherry lime cherry lime As As you you open open pieces pieces of of candy candy let let each each observation observation of of data data D D11 D D22 D D33 be be either either cherry cherry or or lime lime D D11 cherry cherry D D22 cherry cherry D D33 lime lime Predict Predict the the flavor flavor of of the the next next piece piece of of candy candy IfIf the the data data caused caused you you to to believe believe H H11 was was correct correct you d you d pick pick cherry cherry Bayesian Learning Use Use available available data data to to calculate calculate the the probability probability of of each each hypothesis hypothesis and and make make aa prediction prediction Because Because each each hypothesis hypothesis has has an an independent independent likelihood likelihood we we use use all all their their relative relative likelihoods likelihoods when when making making aa prediction prediction Probabilistic Probabilistic inference inference using using Bayes Bayes rule rule P h P hii dd P P dd hhi i P h P hi i likelihood hypothesis prior The The probability probability of of of of hypothesis hypothesis hhii being being active active given given you you observed observed sequence sequence dd equals equals the the probability probability of of seeing seeing data data sequence sequence dd generated generated by by hypothesis hypothesis hhii multiplied multiplied by by the the likelihood likelihood of of hypothesis hypothesis ii being being active active Prediction of an unknown quantity X The The likelihood likelihood of of X X happening happening given given dd has has already already happened happened is is aa function function of of how how much much each each hypothesis hypothesis predicts predicts X X can can happen happen given given dd has has happened happened Even Even though though aa hypothesis hypothesis has has aa high high prediction prediction that that X X will will happen happen this this prediction prediction will will be be discounted discounted ifif the the hypothesis hypothesis itself itself is is unlikely unlikely to to be be true true given given the the observation observation of of dd Details of Bayes rule All All observations observations within within dd are are independent independent identically identically distributed distributed The The probability probability of of aa hypothesis hypothesis explaining explaining aa series series of of observations observations dd is is the the product product of of explaining explaining each each component component Example Prior Prior distribution distribution across across hypotheses hypotheses h1 h1 100 100 cherry cherry 0 1 0 1 h2 h2 75 25 75 25 cherry lime cherry lime 0 2 0 2 h3 h3 50 50 50 50 cherry lime cherry lime 0 5 0 5 h4 h4 25 75 25 75 cherry lime cherry lime 0 2 0 2 h5 h5 100 100 lime lime 0 1 0 1 Prediction Prediction 10 P P dd h h33 0 5 0 5 10 Example Probabilities Probabilities for for each each hypothesis hypothesis starts starts at at prior prior value value 1 1 2 2 4 4 2 2 1 1 Probability Probability of of hh33 hypothesis hypothesis as as 10 10 lime lime candies candies are are observed observed 10 0 4 P P dd h h33 P h P h33 0 5 0 5 10 0 4 Prediction of 11th candy IfIf we ve observed 10 is 11 we ve observed 10 lime lime candies candies is 11thth lime lime Build Build weighted weighted sum sum of of each each hypothesis s hypothesis s prediction prediction from hypothesis Weighted Weighted sum sum can can become become expensive expensive to to compute compute from observations Instead Instead use use most most probable probable hypothesis hypothesis and and ignore ignore others others MAP MAP maximum maximum aa posteriori posteriori Overfitting Remember Remember overfitting overfitting from from NN NN discussion discussion The The number number of of hypotheses hypotheses influences influences predictions predictions Too Too many many hypotheses hypotheses can can lead lead to to overfitting overfitting Overfitting Example Say Say we ve we ve observed observed 33 cherry cherry and and 77 lime lime Consider Consider our our 55 hypotheses hypotheses from from before before prediction prediction is is aa weighted weighted average average of of the the 55 Consider Consider having having 11 11 hypotheses hypotheses one one for for each each permutation permutation The The 3 7 3 7 hypothesis hypothesis will will be be 11 and and all all others others will will be be 00 Learning with Data First First talk talk about about parameter parameter learning learning Let s Let s create create aa hypothesis hypothesis for for candies candies that that says says the the probability probability aa cherry cherry is is drawn drawn is is hh IfIf we we unwrap unwrap N N candies candies and and cc are are cherry cherry what what is is The The log log likelihood
View Full Document