UT CS 395T - Reinforcement Learning: Theory and Practice - D2747925

Home> Schools> University of Texas at Austin> Computer Science (CS) > CS 395T> Reinforcement Learning: Theory and Practice

DOC PREVIEW

UT CS 395T - Reinforcement Learning: Theory and Practice

School name University of Texas at Austin

Course Cs 395t- Multicore Operating Systems Implementation

Pages 20

This preview shows page 1-2-19-20 out of 20 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Title PageGood Afternoon ColleaguesLogisticsLearning from PrimitivesStudent-led DiscussionDiscussion PointsCS395TReinforcement Learning:Theory and PracticeFall 2004Peter StoneDepartment or Computer SciencesThe University of Texas at AustinWeek11b: Thursday, November 18thGood Afternoon Colleagues• Are there any questions?Peter StoneGood Afternoon Colleagues• Are there any questions?• Pending questions:− Helicopter: MDP or POMDP?− Bagnell Theorem 2 correct?Peter StoneLogistics• Tom Dietterich visiting tomorrow:”Three Challenges for Machine Learning Reserch”3pm, ACES 2.302Peter StoneLearning from Primitives• The main reason I chose this paper. . .Peter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)Peter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actionsPeter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actions• Memory-based learning for when to pick primitives andwhich Subgoals to usePeter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actions• Memory-based learning for when to pick primitives andwhich Subgoals to use• Learning from practice amounts to adjusting the kernelfunction: (state, experience) 7→ valuePeter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actions• Memory-based learning for when to pick primitives andwhich Subgoals to use• Learning from practice amounts to adjusting the kernelfunction: (state, experience) 7→ value− State can be discretized, stored in a table− Or approximated continuously with LWPR: locallyweighted polinomial regressionPeter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actions• Memory-based learning for when to pick primitives andwhich Subgoals to use• Learning from practice amounts to adjusting the kernelfunction: (state, experience) 7→ value− State can be discretized, stored in a table− Or approximated continuously with LWPR: locallyweighted polinomial regression• Any active exploration?Peter StoneLearning from Primitives• The main reason I chose this paper. . .• Another reason: Q-learning works! (Fig. 7)• Start with 5 hand-coded primitives:ball goal 7→ motor actions• Memory-based learning for when to pick primitives andwhich Subgoals to use• Learning from practice amounts to adjusting the kernelfunction: (state, experience) 7→ value− State can be discretized, stored in a table− Or approximated continuously with LWPR: locallyweighted polinomial regression• Any active exploration?Peter StoneStudent-led Discussion• Michael on biological primitivesPeter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?• Could you learn from bad trials instead of good ones?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?• Could you learn from bad trials instead of good ones?• How could they make this model-based?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?• Could you learn from bad trials instead of good ones?• How could they make this model-based?• How do primitives compare to options, MAXQ?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?• Could you learn from bad trials instead of good ones?• How could they make this model-based?• How do primitives compare to options, MAXQ?• Should we be impressed?Peter StoneDiscussion Points• Where does the reward function come from? What’s upwith corners?• Can we use the primivitives without observations? or is theaction space too large?• Could you learn from bad trials instead of good ones?• How could they make this model-based?• How do primitives compare to options, MAXQ?• Should we be impressed?Peter

View Full Document

UT CS 395T - Reinforcement Learning: Theory and Practice

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-19-20 out of 20 pages.

UT CS 395T - Reinforcement Learning: Theory and Practice

Sign up for free to view:

Please select your school