Berkeley COMPSCI 182 - Reinforcement Learning - D889221

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 182> Reinforcement Learning

DOC PREVIEW

Berkeley COMPSCI 182 - Reinforcement Learning

School name University of California, Berkeley

Course Compsci 182- Neural Basis of Thought and Language

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS 182Reinforcement LearningAn example RL domain•Solitaire–What is the state space?– What are the actions?– What is the transition function?• Is it deterministic?–What are the rewards?•(What about Tetris?)MDPs•Markov Decision Processes•What makes them “Markov”?•General routine– Start with a state, s–a = π(s)– s' = T(s,a)– r = R(s,a,s')–s = s'; repeatPolicies and values•What are policies?•What are value functions?•How are they related?Bellman equation•How are V(s) and Q(s,a) related?Reward and utility•Do you keep track of utility?•Do you have a value function V(s) or Q(s,a)?•How do you value future rewards?Policies etc.•Consider “micro pac-man world”–4 squares, 1 ghost, move in 4 cardinal directions or stay still– What's a reasonable policy for the domain?– What are the Q-values for this policy?–What would the RL algorithms do from here?•value iteration a.k.a. dynamic programming• Q-learningIssues with RL•What happens when the state space gets big?–or continuous?•What if there's someone else in the environment?•How do you learn faster than thousands of

View Full Document

Berkeley COMPSCI 182 - Reinforcement Learning

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

Berkeley COMPSCI 182 - Reinforcement Learning

Sign up for free to view:

Please select your school