Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Kan sas St ate Univ ersityDepartment o f C om puting a nd Inform ation Sc iencesCIS 830: Advanced Topics in Artificial IntelligenceFriday, February 11, 2000Kiran NandivadaDepartment of Computing and Information Sciences, KSUReadings:“Incorporating Advice into Agents that Learn from Reinforcements”Richard Maclin and Jude W. ShavlikArtificial Neural Networks (1 of 4)Lecture 11Lecture 11Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligencePresentation OverviewPresentation Overview• Paper– “Incorporating Advice into Agents that Learn from Reinforcements”– Authors: Richard Maclin and Jude W. Shavlik, Computer Sciences Department,University of Wisconsin• Overview– Learning from reinforcements by accepting advice from an external observer• Goals– The system accepts the advice– The external observer can provide advice at any timeLEARNERENVIRONMENTOBSERVERadviceStat eAct ionReinforcementbeh aviorKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceTerminologyTerminology• Reinforcement learning– Reward or Reinforcement• Feedback provided to the agent for the action it performed in theprevious state– Task of learning• The agent learns from this reward and chooses actions that producehighest cumulative reward (Mitchell, Ch. 13)– Given• Observation sequence• Discount factor   [0, 1)– Learn to: choose actions that maximize r(t) + r(t + 1) + 2r(t + 2) + …AgentEnv ironmentStateRewardAction sss221100r a2r a1r a0   :::Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceTerminologyTerminology• Q-learning– The agent learns a numerical evaluation function defined over states ofactions, and then implement an optimal policy in terms of this evaluationfunction (Mitchell, Ch. 13)• Connectionist Q-learning– The utility function is implemented as neural network, whose inputs describethe current state and whose outputs are the utility of each actionr(state, action)immediate reward valuesQ(state, action) valuesOne optimal policyV*(state) values100 0 0 100 G 0 0 0 0 0 0 0 0 0 90 81100 G 0 81 72 90 81 81 72 90 81 100 G 90 100 081 90 100GKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligencePresentation OutlinePresentation Outline• Issues– Is the advice given by the external observer used effectively– Does it matter in this type of learning ‘when’ the advice is given– Key strengths - the use of external observer enhanced the learning process– Key weaknesses - accepts only single advice at a timeOutline– Advice taking• Proposed a strategy where several steps described by Hayes-Roth, Klahr, and \Mostow(1981), can be achieved using reinforcement learning.– Experiments– Test Environment– Results– Future work– SummaryKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceAdvice-takingAdvice-taking– Step 1 - Provide advice to the agent• Advice is provided by the external observer whenever the observer feelsappropriate– Step 2 - Convert the advice to an internal representation• Expression of advice is in the form of a simple programming language and list ofterms which specifies certain tasks.– Step 3 - Convert the advice into an usable form• Operationalize the advice - conversion of advice into interpretable statements• Requires a compiler for certain task specific terms– Step 4 - Integrate the reformulated advice into the agent’s current knowledge base• Used an extended KBANN approach• Rules are installed “incrementally” into the network• Insert advice in to the network (connectionist representation of the utility function)at any time during learning• Example - Agent learning to play a video game• A sample version of the advice provided to the agentKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceAdvice-takingAdvice-taking• A sample version of advice• AdviceIF An Enemy IS (Near  West) An Obstacle IS (Near  North)THENMULTIACTIONMoveEast MoveNorthEND;WHEN Surrounded OKtoPushEast An Enemy IS Near REPEATPushEastMoveEastUNTIL ¬ OKtoPushEast ¬ Surrounded     Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceAdvice-takingAdvice-taking• Network showing the advice added by adding hidden units that correspond to theadviceCurrent Hidden UnitsSensor InputsActionsHidden Units for AdviceInputs - Current MovesOutputs - Reward obtainedKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceAdvice-takingAdvice-taking– Allows advice that contains multi-step plansState 1EnemyNear, WestObstacl eNear, NorthOther Input sMoveEast -1MoveNorth-1State 1-1MoveNorthMoveEastOther outputsABKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceAdvice-takingAdvice-taking– Allow advice that contains loopsPushE astMoveE astOther InputsMoveE ast-1PushE ast -1SurroundedOKtoPushEastEnemy NearS1S2S1-1S2-1DECKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceAdvice-takingAdvice-taking– Allow advice that refers to previously defined termsOld DefinitionMoveEastMoveEastNew DefinitionKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceExperimentationExperimentation– Step 5 - Judging the value of advice• Introduces a Q-learning concept to “wash-out” a poor advice• Empirically evaluate the new advice• Retracts or counteract a bad advice– Experiments• Goal - Empirically evaluate whether this particular approach of providing advice isbetter• Hypothesis 1 - System takes advantage of the advice• Hypothesis 2 - Observer provides appropriate advice to the agent at any time during the training– Test Environment• Agent performs certain


View Full Document

K-State CIS 830 - Artificial Neural Networks

Download Artificial Neural Networks
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Artificial Neural Networks and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Artificial Neural Networks 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?