Title PageGood Afternoon ColleaguesLogisticsLearning from PrimitivesStudent-led DiscussionDiscussion PointsCS395TReinforcement Learning:Theory and PracticeFall 2004Peter StoneDepartment or Computer SciencesThe University of Texas at AustinWeek11b: Thursday, November 18thGood Afternoon Colleagues• Are there any questions?Peter StoneGood Afternoon Colleagues• Are there any questions?• Pending questions:− Helicopter: MDP or POMDP?− Bagnell Theorem 2 correct?Peter StoneLogistics• Tom Dietterich visiting tomorrow:”Three Challenges for Machine Learning Reserch”3pm, ACES 2.302Peter StoneLearning from Primitives• The main reason I chose this paper. . .Peter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)Peter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actionsPeter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actions• Memory-based learning for when to pick primitives andwhich Subgoals to usePeter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actions• Memory-based learning for when to pick primitives andwhich Subgoals to use• Learning from practice amounts to adjusting the kernelfunction: (state, experience) 7→ valuePeter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actions• Memory-based learning for when to pick primitives andwhich Subgoals to use• Learning from practice amounts to adjusting the kernelfunction: (state, experience) 7→ value− State can be discretized, stored in a table− Or approximated continuously with LWPR: locallyweighted polinomial regressionPeter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actions• Memory-based learning for when to pick primitives andwhich Subgoals to use• Learning from practice amounts to adjusting the kernelfunction: (state, experience) 7→ value− State can be discretized, stored in a table− Or approximated continuously with LWPR: locallyweighted polinomial regression• Any active exploration?Peter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actions• Memory-based learning for when to pick primitives andwhich Subgoals to use• Learning from practice amounts to adjusting the kernelfunction: (state, experience) 7→ value− State can be discretized, stored in a table− Or approximated continuously with LWPR: locallyweighted polinomial regression• Any active exploration?Peter StoneStudent-led Discussion• Michael on biological primitivesPeter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?• Could you learn from bad trials instead of good ones?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?• Could you learn from bad trials instead of good ones?• How could they make this model-based?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?• Could you learn from bad trials instead of good ones?• How could they make this model-based?• How do primitives compare to options, MAXQ?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?• Could you learn from bad trials instead of good ones?• How could they make this model-based?• How do primitives compare to options, MAXQ?• Should we be impressed?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?• Could you learn from bad trials instead of good ones?• How could they make this model-based?• How do primitives compare to options, MAXQ?• Should we be impressed?Peter
View Full Document