HMM MDP RL Mike Stilman and a very frustrated robot robot cmu edu The Robotics Institute Carnegie Mellon University Machine Learning 10 701 Course Review Fall 2005 Copyright 2005 Mike Stilman Dec 14th 2005 THE WORLD A robot is trying to get to the goal z z z Robot R Goal S4 4 States S2 S1 Copyright 2005 Mike Stilman S3 S4 HMM MDP RL Slide 2 1 Markov Can we represent this as a Markov System z Why S2 S3 S1 Copyright 2005 Mike Stilman S4 HMM MDP RL Slide 3 S2 Ok Markov S1 S3 S4 Markov Model z Legitimate Transitions S2 S1 Copyright 2005 Mike Stilman S3 ENCODES WALL S4 HMM MDP RL Slide 4 2 Elementary Markov Learning z Lets It Does Sequence S1 S2 S1 S1 S2 S2 S1 S2 S3 S2 S2 S2 S2 S3 S4 S4 S3 S3 S3 S3 Watch the Robot S2 S1 S3 P St 1 S2 St S1 S4 S1 S2 S4 S3 Copyright 2005 Mike Stilman HMM MDP RL Slide 5 Elementary Markov Learning z Lets It Does Sequence S1 S2 S1 S1 S2 S2 S1 S2 S3 S2 S2 S2 S2 S3 S4 S4 S3 S3 S3 S3 Watch the Robot S2 S3 P St 1 S2 St S1 S1 S2 S1 75 S1 S4 S1 Copyright 2005 Mike Stilman S2 S3 S4 HMM MDP RL Slide 6 3 Elementary Markov Learning z Lets It Does Sequence S1 S2 S1 S1 S2 S2 S1 S2 S3 S2 S2 S2 S2 S3 S4 S4 S3 S3 S3 S3 Watch the Robot 25 S1 5 75 S2 2 25 S4 S3 5 2 25 5 6 Copyright 2005 Mike Stilman HMM MDP RL Slide 7 HMM Observations S2 S3 S1 S4 OK we learned transition probabilities Wall Lets expand our model z Robots have sensors z Our robot can only look up z of the time it s wrong hence frustrated S1 W 25 N 75 Copyright 2005 Mike Stilman S2 W 75 N 25 S3 W 75 N 25 S4 W 25 N 75 HMM MDP RL Slide 8 4 S2 HMM Transitions S3 S1 25 5 S1 75 6 S2 A 5 2 25 S3 S4 2 25 S1 S2 S3 S4 S1 25 75 0 0 S2 25 5 25 0 S3 0 2 6 2 S4 0 0 5 5 S4 5 Transition Matrix Copyright 2005 Mike Stilman HMM MDP RL Slide 9 S2 HMM Observations S3 S1 25 5 S1 A S1 W 25 N 75 S1 25 75 6 S2 25 2 S2 S3 S4 B 75 0 0 5 2 25 W 75 N 25 S4 S3 S4 W 75 N 25 5 W N S1 25 75 S2 75 25 S2 25 5 25 0 S3 75 25 S3 0 2 6 2 S4 25 75 S4 0 0 5 5 W 25 N 75 Observation Matrix Transition Matrix Copyright 2005 Mike Stilman HMM MDP RL Slide 10 5 HMM Complete Model S2 S3 S1 25 5 S1 A 75 6 S2 25 W 75 N 25 2 S1 S3 S4 B S2 75 0 0 5 2 25 W 25 N 75 S4 S3 S4 W 75 N 25 S1 5 W 25 N 75 W N 25 75 S1 5 3 S1 25 S2 75 25 S2 S2 25 5 25 0 S3 75 25 S3 1 S3 0 2 6 2 S4 25 75 S4 1 S4 0 0 5 5 Initial Observation Matrix Transition Matrix Copyright 2005 Mike Stilman HMM MDP RL Slide 11 HMM Where am I S2 S3 S1 S4 z The robot has observed z No Wall z Wall z Wall z No Wall z What is the probability that the robot is now at the goal P q4 S4 N1 W2 W3 N4 Copyright 2005 Mike Stilman HMM MDP RL Slide 12 6 HMM Where am I A FORWARD ALGORITHM t i P O1 Ot qt Si S1 S2 S3 S4 S1 25 75 0 0 S2 25 5 25 0 S3 0 2 6 2 S4 0 0 5 5 Initialization 1 i P O1 qt Si Si P O1 Si z N S1 W N 5 75 375 S2 3 25 075 S3 1 25 025 S4 W S2 75 25 S3 75 25 S4 25 75 A Induction t 1 i i t i Aij Bj Ot 1 N 375 S2 075 S3 025 W 375 25 075 25 25 075 Copyright 2005 Mike Stilman S1 5 S2 3 S3 1 S4 1 HMM MDP RL Slide 13 FORWARD ALGORITHM t i P O1 Ot qt Si S4 75 1 75 075 HMM Where am I S1 N 25 Copyright 2005 Mike Stilman z W S1 B W S1 S2 S3 S4 S1 25 75 0 0 S2 25 5 25 0 S3 0 2 6 2 S4 0 0 5 5 W N S1 25 75 S2 75 25 S3 75 25 S4 25 75 B N S1 5 S2 3 S3 1 S4 1 HMM MDP RL Slide 14 7 HMM Where am I A FORWARD ALGORITHM t i P O1 Ot qt Si S1 S2 S3 S4 S1 25 75 0 0 S2 25 5 25 0 S3 0 2 6 2 S4 0 0 5 5 Induction t 1 i i t i Aij Bj Ot 1 z N S1 S2 W 375 075 S3 025 S4 075 W N 1125 A Induction t 1 i i t i Aij Bj Ot 1 N S3 025 S4 W 375 075 S2 75 25 S3 75 25 S4 25 75 1125 2428 075 25 025 6 075 5 75 075 Copyright 2005 Mike Stilman S1 5 S2 3 S3 1 S4 1 HMM MDP RL Slide 15 FORWARD ALGORITHM t i P O1 Ot qt Si S2 75 HMM Where am I S1 N 25 375 75 075 5 025 2 75 Copyright 2005 Mike Stilman z W S1 B W S1 S2 S3 S4 S1 25 75 0 0 S2 25 5 25 0 S3 0 2 6 2 S4 0 0 5 5 W N S1 25 75 S2 75 25 S3 75 25 S4 25 75 B N S1 5 S2 3 S3 1 S4 1 HMM MDP RL Slide 16 8 HMM Where am I A FORWARD ALGORITHM t i P O1 Ot qt Si S1 S2 S3 S4 S1 25 75 0 0 S2 25 5 25 0 S3 0 2 6 2 S4 0 0 5 5 Induction t 1 i i t i Aij Bj Ot 1 z N S1 W 375 N 1125 S2 075 2428 S3 025 0534 S4 W 075 A Induction t 1 i i t i Aij Bj Ot 1 375 W 1125 S2 075 2428 S3 025 0534 S4 075 S2 75 25 S3 75 25 S4 25 75 0106 Copyright 2005 Mike Stilman S1 5 S2 3 S3 1 S4 1 HMM MDP RL Slide 17 FORWARD ALGORITHM t i P O1 Ot qt Si S1 75 0106 HMM Where am I N N 25 Copyright 2005 Mike Stilman z W S1 B …
View Full Document