CMU CS 10601 - lecture - D604032

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10601> lecture

DOC PREVIEW

CMU CS 10601 - lecture

School name Carnegie Mellon University

Course Cs 10601- Introduction to Machine Learning

Pages 34

This preview shows page 1-2-16-17-18-33-34 out of 34 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

10-601Machine LearningMarkov decision processes (MDPs)What’s missing in HMMs• HMMs cannot model important aspects of agent interactions:- No model for rewards- No model for actions which can affect these rewards• These are actually issues that are faced by many applications:- Agents negotiating deals on the web- A robot which interacts with its environmentExample: No actionsGraduate student20Asst. prof40Tenured prof.100Google200On the street0Dead00.60.20.7 0.90.20.10.1 0.110.60.10.30.30.10.60.1Formal definition of MDPs• A set of states {s1… sn}• A set of rewards {r1 … rn} • A set of actions {a1.. am}• Transition probability)&|(1, kttjtkjiahiqsqPP One reward for each stateNumber of actions could be larger than number of statesQuestions• What is my expected pay if I am in state i• What is my expected pay if I am in state i and perform action a?Solving MDPs• No actions: Value iteration• With actions: Value iteration, Policy iterationValue computation• An obvious question for such models is what is combined expected value for each state• What can we expect to earn over our life time if we become Asst. prof.?• What if we go to industry?Before we answer this question, we need to define a model for future rewards:• The value of a current award is higher than the value of future awards- Inflation, confidence- Example: LotteryDiscounted rewards• The discounted rewards model is specified using a parameter • Total rewards = current reward + (reward at time t+1) +2(reward at time t+2) +…k(reward at time t+k) + …infinite sumDiscounted rewards• The discounted rewards model is specified using a parameter • Total rewards = current reward + (reward at time t+1) +2(reward at time t+2) +………….k(reward at time t+k) + …infinite sumConverges if 0<<1Determining the total rewards in a state • Define J*(si) = expected discounted sum of rewards when starting at state si• How do we compute J*(si)?))(*)(*)(*()(*2211 niniiiiisJpsJpsJprXrsJHow can we solve this?Factors expected pay for all possible transitions for step iComputing j*(si) ))(*)(*)(*()(*222212122 nnsJpsJpsJprsJ ))(*)(*)(*()(*121211111 nnsJpsJpsJprsJ • We have n equations with n unknowns • Can be solved in closed form))(*)(*)(*()(*2211 nnnnnnnsJpsJpsJprsJ Iterative approaches• Solving in closed form is possible, but may be time consuming.• It also doesn’t generalize to non-linear models• Alternatively, this problem can be solved in an iterative manner• Lets define Jt(si) as the expected discounted rewards after t steps• How can we compute Jt(si)?iirSJ )(1kkkiiisJprSJ )()(1,2kktkiiitsJprSJ )()(,1Iterative approaches• Solving in closed form is possible, but may be time consuming.• Alternatively, this problem can be solved in an iterative manner• Lets define Jt(si) as the expected discounted awards after t steps• How can we compute Jt(si)?iirSJ )(1kkkiiisJprSJ )()(1,2kktkiiitsJprSJ )()(,1We know how to solve this!Let’s fill the dynamic programming tableBut wait …This is a never ending task!When do we stop?iirSJ )(1kkkiiisJprSJ )()(1,2kktkiiitsJprSJ )()(,1Remember, we have a converging functionWe can stop when |Jt-1(si)- Jt(si)|< Infinity norm selects maximal elementExample for =0.9Graduate student20Asst. prof40Google200Dead00.90.20.80.20.10.110.60.1tJt(Gr) Jt(P) Jt(Goo) Jt(D)1 20 40 200 02 74 87 362 03 141 135 493 04 209 182 600 0J2(Gr)=20+0.9*(0.6*20+0.2*40+0.2*200)Solving MDPs• No actions: Value iteration• With actions: Value iteration, Policy iterationAdding actionsA Markov Decision Process:• A set of states {s1… sn}• A set of rewards {r1 … rn} • A set of actions {a1.. am}• Transition probability)&|(1, kttjtkjiahiqsqPP Example: ActionsGraduate student20Asst. prof40Tenured prof.100Google200On the street0Dead0Action A0.70.7 0.90.30.20.110.10.30.30.10.60.1Action B0.60.10.1Action A: Leave to GoogleAction B: Stay in academia0.8Questions for MDPs• Now we have actions• The question changes to the following:Given our current state and the possible actions, what is the best action for us in terms of long term payment?Example: ActionsAction A: Leave to GoogleAction B: Stay in academiaGraduate student20Asst. prof40Google200Dead00.90.80.10.110.10.70.3Action BAction A0.80.2So should you leave now (right after class) or should you stay at CMU?Policy• A policy maps states to actions• An optimal policy leads to the highest expected returns• Note that this does not depend on the start stateGr BGo AAsst. Pr. ATen. Pr. BSolving MDPs with actions• It could be shown that for every MDP there exists an optimal policy (we won’t discuss the proof).• Such policy guarantees that there is no other action that is expected to yield a higher payoffComputing the optimal policy: 1. Modified value iteration• We can compute it by modifying the value iteration method we discussed.• Define pkij as the probability of transitioning from state i to state j when using action k• Then we compute:jjtkikitsJprSJji)()(,max1Also known as Bellman’s equationUse probabilities associated with action kComputing the optimal policy: 1. Modified value iteration• We can compute it by modifying the value iteration method we discussed.• Define pkij as the probability of transitioning from state i to state j when using action k• Then we compute:jjtkikitsJprSJji)()(,max1Run until convergenceComputing the optimal policy: 1. Modified value iteration• We can compute it by modifying the value iteration method we discussed.• Define pkij as the probability of transitioning from state i to state j when using action k• Then we compute:jjtkikitsJprSJji)()(,max1• When the algorithm converges, we have computed the best outcome for each state• We associate states with the actions that maximize their returnValue iteration for =0.9Graduate student20Asst. prof40Google200Dead00.90.80.10.110.1tJt(Gr) Jt(P) Jt(Goo) Jt(D)1 20 40 200 02168(A)51(B)87 362 03311(A)120(B)135 493 04431(A)189(B)182 600 00.70.3Action BAction

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-16-17-18-33-34 out of 34 pages.

CMU CS 10601 - lecture

Sign up for free to view:

Please select your school