DOC PREVIEW
UT CS 395T - Reinforcement Learning: Theory and Practice

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Title PageGood Afternoon ColleaguesLogisticsLearning from PrimitivesStudent-led DiscussionDiscussion PointsCS395TReinforcement Learning:Theory and PracticeFall 2004Peter StoneDepartment or Computer SciencesThe University of Texas at AustinWeek11b: Thursday, November 18thGood Afternoon Colleagues• Are there any questions?Peter StoneGood Afternoon Colleagues• Are there any questions?• Pending questions:− Helicopter: MDP or POMDP?− Bagnell Theorem 2 correct?Peter StoneLogistics• Tom Dietterich visiting tomorrow:”Three Challenges for Machine Learning Reserch”3pm, ACES 2.302Peter StoneLearning from Primitives• The main reason I chose this paper. . .Peter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)Peter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actionsPeter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actions• Memory-based learning for when to pick primitives andwhich Subgoals to usePeter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actions• Memory-based learning for when to pick primitives andwhich Subgoals to use• Learning from practice amounts to adjusting the kernelfunction: (state, experience) 7→ valuePeter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actions• Memory-based learning for when to pick primitives andwhich Subgoals to use• Learning from practice amounts to adjusting the kernelfunction: (state, experience) 7→ value− State can be discretized, stored in a table− Or approximated continuously with LWPR: locallyweighted polinomial regressionPeter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actions• Memory-based learning for when to pick primitives andwhich Subgoals to use• Learning from practice amounts to adjusting the kernelfunction: (state, experience) 7→ value− State can be discretized, stored in a table− Or approximated continuously with LWPR: locallyweighted polinomial regression• Any active exploration?Peter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actions• Memory-based learning for when to pick primitives andwhich Subgoals to use• Learning from practice amounts to adjusting the kernelfunction: (state, experience) 7→ value− State can be discretized, stored in a table− Or approximated continuously with LWPR: locallyweighted polinomial regression• Any active exploration?Peter StoneStudent-led Discussion• Michael on biological primitivesPeter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?• Could you learn from bad trials instead of good ones?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?• Could you learn from bad trials instead of good ones?• How could they make this model-based?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?• Could you learn from bad trials instead of good ones?• How could they make this model-based?• How do primitives compare to options, MAXQ?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?• Could you learn from bad trials instead of good ones?• How could they make this model-based?• How do primitives compare to options, MAXQ?• Should we be impressed?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?• Could you learn from bad trials instead of good ones?• How could they make this model-based?• How do primitives compare to options, MAXQ?• Should we be impressed?Peter


View Full Document

UT CS 395T - Reinforcement Learning: Theory and Practice

Documents in this Course
TERRA

TERRA

23 pages

OpenCL

OpenCL

15 pages

Byzantine

Byzantine

32 pages

Load more
Download Reinforcement Learning: Theory and Practice
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Reinforcement Learning: Theory and Practice and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Reinforcement Learning: Theory and Practice 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?