CS 188: Artificial Intelligence Spring 2007Final ExamUtility-Based AgentsTodayQuestionsMarkov ModelsConditional IndependenceExampleHidden Markov ModelsSlide 14Slide 15Forward AlgorithmViterbi AlgorithmSlide 23Parameter estimationGeneral Naïve BayesEstimation: SmoothingSlide 30Types of Supervised classifiersSlide 33The Binary PerceptronThe Multiclass PerceptronLinear SeparatorsFeature designMDP and Reinforcement LearningMarkov Decision ProcessesBellman’s Equation for Selecting actionsElements of RLMDPsReinforcement LearningModel-Free LearningProblems with TD Value LearningQ-LearningApplications to NLPNLP applications of Bayes Rules!!AmbiguitiesSlide 59Thanks!Phase II: Update MeansCS 188: Artificial IntelligenceSpring 2007Lecture 29: Post-midterm course review5/8/2007Srini Narayanan – ICSI and UC BerkeleyFinal Exam8:10 to 11 AM on 5/15/2007 at 50 BIRGEFinal prep page upIncludes all topics (see page). Weighted toward post midterm topics.2 double sided cheat sheets allowed as is a calculator.Final exam review Thursday 4 PM Soda 306.Utility-Based AgentsTodayReview of post midterm topics relevant for the final.Reasoning about timeMarkov ModelsHMM forward algorithm, Vitterbi Algorithm.ClassificationNaïve Bayes, PerceptronReinforcement LearningMDP, Value Iteration, Policy iterationTD-value learning, Q-learning, Advanced topicsApplications to NLPQuestionsWhat is the basic conditional independence assertion for markov models?What is a problem with Markov Models for prediction into the future?What are the basic CI assertions for HMM?How do inference algorithms exploit the CI assertionsForward Algorithm Viterbi algorithm.Markov ModelsA Markov model is a chain-structured BNEach node is identically distributed (stationarity)Value of X at a given time is called the stateAs a BN:Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial probs)X2X1X3X4Conditional IndependenceBasic conditional independence:Past and future independent of the presentEach time step only depends on the previousThis is called the (first order) Markov propertyNote that the chain is just a (growing BN)We can always use generic BN reasoning on it (if we truncate the chain)X2X1X3X4ExampleFrom initial state (observation of sun)From initial state (observation of rain)P(X1) P(X2) P(X3) P(X)P(X1) P(X2) P(X3) P(X)Hidden Markov ModelsMarkov chains not so useful for most agentsEventually you don’t know anything anymoreNeed observations to update your beliefsHidden Markov models (HMMs)Underlying Markov chain over states SYou observe outputs (effects) at each time stepAs a Bayes’ net:X5X2E1X1X3X4E2E3E4E5ExampleAn HMM isInitial distribution:Transitions:Emissions:Conditional IndependenceHMMs have two important independence properties:Markov hidden process, future depends on past via the presentCurrent observation independent of all else given current stateQuiz: does this mean that observations are independent given no evidence?[No, correlated by the hidden state]X5X2E1X1X3X4E2E3E4E5Forward AlgorithmCan ask the same questions for HMMs as Markov chainsGiven current belief state, how to update with evidence?This is called monitoring or filteringFormally, we want:X5X2E1X1X3X4E2E3E4E5Viterbi AlgorithmQuestion: what is the most likely state sequence given the observations?Slow answer: enumerate all possibilitiesBetter answer: cached incremental versionX5X2E1X1X3X4E2E3E4E5ClassificationSupervised ModelsGenerative ModelsNaïve BayesDiscriminative ModelsPerceptronUnsupervised ModelsK-meansAgglomerative ClusterParameter estimationWhat are the parameters for Naïve Bayes?What is Maximum Likelihood estimation for NB?What are the problems with ML estimates?General Naïve BayesA general naive Bayes model:We only specify how each feature depends on the classTotal number of parameters is linear in nCE1EnE2|C| parametersn x |E| x |C| parameters|C| x |E|n parametersEstimation: SmoothingProblems with maximum likelihood (relative frequency) estimates:If I flip a coin once, and it’s heads, what’s the estimate for P(heads)?What if I flip 10 times with 8 heads?What if I flip 10M times with 8M heads?Basic idea:We have some prior expectation about parameters (here, the probability of heads)Given little evidence, we should skew towards our priorGiven a lot of evidence, we should listen to the dataEstimation: Laplace SmoothingLaplace’s estimate (extended):Pretend you saw every outcome k extra timesWhat’s Laplace with k = 0?k is the strength of the priorLaplace for conditionals:Smooth each condition independently:H H TTypes of Supervised classifiersGenerative ModelsNaïve BayesDiscriminative ModelsPerceptronQuestionsWhat is a binary threshold perceptron?How can we make a multi-class perceptron?What sorts of patterns can perceptrons classify correctlyThe Binary PerceptronInputs are featuresEach feature has a weightSum is the activationIf the activation is:Positive, output 1Negative, output 0f1f2f3w1w2w3>0?The Multiclass PerceptronIf we have more than two classes:Have a weight vector for each classCalculate an activation for each classHighest activation winsLinear SeparatorsBinary classification can be viewed as the task of separating classes in feature space:w . x = 0w . x < 0w . x > 0Feature designCan we design features f1 and f2 to use a perceptron to separate the the two classes?MDP and Reinforcement LearningWhat is an MDP (Basics) ?What is Bellman’s equation and how is it used in value iteration?What is reinforcement learningTD-value learningQ learningExploration vs. exploitationMarkov Decision ProcessesMarkov decision processes (MDPs)A set of states s SA model T(s,a,s’) = P(s’ | s,a)Probability that action a in state s leads to s’A reward function R(s, a, s’) (sometimes just R(s) for leaving a state or R(s’) for entering one)A start state (or distribution)Maybe a terminal stateMDPs are the simplest case of reinforcement learningIn general reinforcement learning, we don’t know the model or the reward functionBellman’s Equation for Selecting actions Definition of utility leads to a simple
View Full Document