UCSD CSE 254 - Learning to Use Selective Attention and Short-Term Memory

Unformatted text preview:

Published: From Animals to Animats, Fourth International Conferenceon Simulation of Adaptive Behavior, (SAB'96)Cape Cod, Massachusetts. September, 1996Written By:Andrew Kachites McCallumPresented By:James M. VaccaroCSE 254 June 4, 2002Learning to Use Selective Attention andShort-Term Memory in Sequential TasksScenarioApplication is a Sequential Task for Passing trucks in heavy trafficUse Selective Attention & Action to Observe & React to Current Statecartrucksfieldof viewfour laneroadwaytrucksRewards: -10 for scrapping a truck, -1 for causing a truck to beep, and 0.1 otherwisegaze-forward-rightgaze-backwardgaze-backward gaze-forward-leftshift-to-gaze-laneshift-to-gaze-lanegaze-forward-centergaze-backward carSensors:Hear hornGaze objectGaze sideGaze directionGaze speedGaze distanceGaze refined distanceGaze colorActions:Environment:Outline• Issue: Dealing with Too Much and Too Little Information• New Algorithm: U-Tree Algorithm• U-Tree Algorithm Components1. Utile Distinction Memory2. Instance-Based Learning with Nearest Sequence Memory3. Utile Suffix Memory and Learning Tree• U-Tree Algorithm Revisited1. Example of Combining Approaches2. Transition Instance Chain3. Learning U-Tree Algorithm• Application & Demo• Results, Inconsistencies & SummaryDealing with Too MuchToo many possible sensory states, actions or observationsNot all pertinent states can be observed at any one timeand Too Little Information|A | = 5|D1| = 2|D2| = 3|D3| = 3|D4| = 2|D5| = 2|D6| = 3|D7| = 2|D8| = 6|R| = 3t = 0t = -1t = -2t = -3t = 1t = 2t = 3AD1D2D3D4D5D6D7D8R8 dimensional feature spacetimeRewardsActionsHear hornGaze objectGaze sideGaze directionGaze speedGaze distanceGaze refined distanceGaze colorCompact, contextdependent setU-Tree AlgorithmKey feature is it simultaneously handles:- “too much sensory data” - “too little sensory data”Two primary capabilities:- Selective attention, to prune tree of instances- Short-term memory, to augment perceptual state spaceCombines advantages of several previous algorithms:- Parti-Game, Nearest Sequence Memory- Utile Distinction Memory- G-Algorithm- Utile Suffix MemoryTwo key structures:- Time-ordered chain of instances that represent raw-experience- A tree where these instances are organized(1) Utile Distinction Memory [McCallum 1993]Uses a statistical technique, a utile distinction test to separatenoise from task structure and keep only short-term memoriesthat help predict reward(2) Parti-Game [Moore, 1993] &Nearest Sequence Memory [McCallum 1995]instance-based finding conjunctions among features(3) G-Algorithm [Chapman 1989]Agent can select which individual features to attend to New Feature: Ability to divide percepts intocomponents to perform selective attention(1) & (2) Utile Suffix Memory (USM) [McCallum 1995]What Makes Up the U-Tree Algorithm?Task: Three alternative methods make up the U-Tree algorithm1stTechnique: Utile Distinction Memory (UDM)The utile distinction test distinguishes states that have different policy actions ordifferent utilities, and merges states that have the same policyaction and same utility.Notation:? ?m,...,o,oo21?OObservations:where m is the number of observed features? ?n,...,s,ssS21?States:where n is the number of states? ?k,...,a,aaA21?Actions:where k is the number of possible actions)(ji|soOObservation probability: Reward:r)r(,jika|ssPTransition probability: State-action value:q(si,aj)? ?t??State probability: Utility or use of state:? ?? ?tU ??Theorem: The state distinctions necessary for representing the optimal policy are not necessarily sufficient for learning the optimal policy (proof by counter example)s(t) = <a(t-1),o(t),r(t)>State make-up:? ?? ?,atpaat?Qmaxarg??1)),(())((attaU 11 ?? ? ????Qmax? ?? ?? ? ? ?? ?? ?? ?? ?11?????tp?Urtßp,asqtp-,as,qsiitiitii??5. Choose best action:3. Update Q-values:? – learning rate? – temporal discount factor4. Update expected utility:Utile Distinction Memory Algorithm??ijiijasqta,t),()())((???Q)(),r()(1)(ta|ssP|soktitijjtji???????1O2. Current policy:1. Agent’s belief in sj:State in considerationfor splittingConfidence intervals of futurediscounted reward for eachof the four possible actions.Four outgoing actionsEvaluating Potential Node Splitting Possibilitiesreturn[m] = r[m]for t = m – 1 to 0return[t] = r[t] + ?*return[t + 1]Calculate return values over time:for t = 1 to m – 1for all transitions, trans,that use action A[t – 1]? = ?*?trans[t – 1]trans.countA[t]+= ?trans.sumA[t]+= ?*return[t]trans.sumsquaresA[t]+= ?*(return[t])2Calculate upper/lower bound statistics:? is learning ratePossible previous states(part of a state space model)r1r2r3r4r5Confidence intervals onfuture discounted reward:1 2 3 4 51 2 3 5 466a 6bBefore split:After split:Splitting Nodes Based on Confidence IntervalsNode Splitting Test: Kolmogorov-Smirnov (KS)Statistical tests (in general, for KS and with details):1. null hypothesisKS: Calculate or use Relative frequency distribution F0(X)Assume F0(X) is a student t function with n-1 degrees of freedom.Calculate a sample mean and standard deviation fromobserved future discounted rewards for each action.2. an alternative hypothesisKS: Observed cumulative frequency distribution Sn(X) Store future discounted reward distributions for each actionand set it equal to Sn(X) .3. decision makerKS: D = max(|F0(X) - Sn(X)|)Calculate the largest absolute difference between F0(X) and Sn(X).4. rejection regionKS: For specified ? and sample size n reject if D exceeds a = ?Promote split nodes from the previous state if rejected.95a7a10a 85b7b10b 125a7a109 12578109 125a78109 1278 9 127a85b 5a 5b 5a 5b7b10a 10bTrial 1 Trial 2 Trial 3 Trial 4Utile Distinction Memory (UDM) ExampleGrid-world:Theorem: The state distinctions necessary for representing the optimal policy are not necessarily sufficient for learning the optimal policyNote: Proof by counter example.Learning in a Geometric Spacek - nearest neighbor, k = 3Learning in a Sequence Spacek - nearest neighbor, k = 31 8 5 5 2 8 5 5 2 5 5 8 5 5 9 1 8 5 5x100031021031004100action, percept, rewardmatch lengthx2ndTechnique: Nearest Sequence Memory (NSM)2 dimensions? ?? ?? ? ? ?????ijja|asjjitsqksvaQn(si, sj) = 1 + n(si-1, sj-1), if (ai-1= aj-1) ^ (oi-1= oj-1) ^ (ri-1= rj-1)0, otherwise? ?jia|as,ssnijj??max1, if n(st, si) is among the k0, otherwisev(si) = Nearest Sequence Memory Algorithm (Step 1 & 2)Step 1:Step 2:Find k-nearest neighbors of chained events:Vote for the best neighbors and resolve ties with


View Full Document

UCSD CSE 254 - Learning to Use Selective Attention and Short-Term Memory

Download Learning to Use Selective Attention and Short-Term Memory
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Learning to Use Selective Attention and Short-Term Memory and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Learning to Use Selective Attention and Short-Term Memory 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?