Unformatted text preview:

Chapter 6: Adaptation Adaptation- Assume static world before- = Learning - Previous chapters about “calculated rationality” – some sort of choice in a static system- Here, we assume that person acts, the world reacts, and then the person needs to (choose to) adapt to world’s response - Thus, “adaptively rational” Adaptation and Communication- Rats in a maze: the cheese is a message that means “turn towards here”- A kid gets in trouble to get attention: the trouble is a message that says “attend to me”- Someone, or the world, communicates to a person, who then learns and adapts- Teaching, persuading, influencing, attracting- does all communicationask for adaptation by the receiver? Basic Model- Reinforcement learning is main topic of the chapter o Alternative learning models: S-R learning, modeling behavior, etc.- Ex of rat, Alfred, in mazeo Initially random beho Finds some reinforcemento Adapts beh to the prospect of rewardAlfred- Trial and error learning- Behavior becomes less and less random- At first,o Pr (0) = Pl (0) = 50% - Pr is the probability of a right turn- (0) refers to time zero, before the first trialAlfred- Since the food is always on the right, over time, the probability of turning right increaseso Pr (t + 1) = Pr (t) + some increment- So we can understand our problem as needing to model the increment Modeling the Increment- Model 1: a constant increment model o Suppose the increment is .2  Pr (0) =.50 Pr (1) =.70 Pr (2) =.90 Pr (3) 1.10o Well it looked reasonable for a while - Model 2: a constant proportion model o If the cheese is always on the right, then [ 1 – Pr (t) ] represents how much Alfred has yet to learno Suppose we model him this way: at each trial, Alfred learns a constant proportion of what he has left to learn - Model 2 requires some constant, a, that represents the proportion Alfred learns on each trialPr (1) = Pr (0) + a [1- Pr (0) ]- This means: the probability of going right on some trial = that probability from previous trial plus the rate of learning (a) what had yet to be learned on the prior trial o If you have cheese you’re probability of going to the right on the next trial is equal to your probability of going to right before + an increment that is how fast you learned (from finding cheese) X how much you have left to learn- For Pr(0) in the rat example, 0 refers to point of time. It is the rat’s initial probability of turning right before receiving any training. Yes- Model 2 never gives a result where the probability is >1 - Since a is a fraction, at each point you just add a fraction of what wasleft to learn - Produces an asymptotic curve, where Alfred’s behavior approaches Pr=1- P. 225- Model 2 fits observed results reasonably well- Predicts:o Learning occurso Probabilities rise faster at the start of learningo Stupid choices remain possible, though decreasingly likelyo We can measure learning rate, a11/14/12Assumptions- Alternative behaviors available (left, right)- State of individual described by probabilities that add up to 1- World responds differently to various behaviors (Cheese, no cheese)- State of world described by probabilities of various responses to eachbehavior (right = 70% cheese) - Set of possible events is combination of possible behaviors crossed by possible world responses (left/no reward, left/reward)- Specify adaptation equations for each possible event (event 1=left/no reward) Adaptation Equations- So the model generates sets of equations and these describe the learning/adaptationo Accomplished by explaining how the probabilities change- Text model will be restricted too Two choices (left/right)o Two outcomes (cheese/no cheese)o One person or party doing the learning Alfred- Two initial choice probabilitieso Pr (t) = probability of doing behavior R at time to Pl (t) = 1- Pr (t) = probability of doing behavior L at t - 4 events:o E1 =L and rewardo E2 = L and no rewardo E3= R and rewardo E4= R and no reward Alfred’s Equations- E1, L and reward: increase PL, decrease PR o Pl (t + 1) = Pl (t) + a [1- Pl (t)]o We’ve seen this before: a is the rate of learning what’s yet to be learned- E2, L and no reward: decrease Pl increase PRo PL (t + 1) = Pl (t) – b Pl (t) o The b is new. It is the learning rate associated with non-reward, just as the a is the rate for reward. It also takes values between o and 1. o NB: the negative sign prior to the bo Minus sign because it’s a decrement (no cheese) o A and B both fractions between 0 and 1 The b- Why is it multiplied against Pl (t) instead of [1- Pl (t) ] (like a is) - Because the non-reward reduces the likelihood of doing Pl again- So we multiply Pl by some fraction like .2 (ie, b) or something and subtract that fraction from the orignal Pl- After non-reward Clicker Questions- The higher a is…the faster the party learns by being rewarded- The higher b is….the faster the party learns by not being rewardedAlfred’s Equations- E3, R and reward: decrease PL, increase PRo Pr (t + 1) = Pr (t) + a [1-Pr (t)] - E4, R and no reward: increase Pl, decrease Pro Pr (t+1) = Pr (t) – b Pr (t) Alfred’s Equations- Notice that a and b have no particular relationship to another (summing to 1 or being equal) Preliminary Conclusions- Rewarded behavior becomes more probable, and non-rewarded behavior less probable- Rewards cause behavior to change at rate a, and non-rewards cause behavior to change at rate b- The amount of adaptation of any trial is always a constant fraction (a or b) of the amount left to learned A worked Alfred Problem- P. 262 f- Assume Pl (0) = .50- Assume a = .3- Assume b =.2- Make sure you can do the math- NB: Pl = 1 – Pr ( or Pl + Pr = 1) - Worked problemo A= .5 o B= .1o Pl (0) = .3 o Reward on left o Go left/go righto Pl (1) = Pl (0) + a [1- Pl (0)] = .3 + .5 (1-.3) = .65 .65 go left .35 go right o PL (1) = 1 – Pr (1) = 1 – [Pr (0) – b Pr (0)] = 1- (.7 - .1x .7) 1- .63 =.37 .37 go left .63 go righto .65 + .37 does not equal 1  Do not need to add up to 1  The only probabilities that need to add up to 1 are ones that come out of same node (.3 + .7) but Pl (1)= .65 and Pl (1) = .37 do not come out of same node o .35 go right on trial 2  Pl (2) = .685o .65 go left on trial 2  Pl (2) = .825 Went left twice, got rewarded twice Need increment equation because rewarded twice- probability should be higher Pl (2) = Pl (1) + a [1- Pl (1) = .65 + .5 (.35) = .825 Pl (2) …


View Full Document

UMD COMM 402 - Chapter 6: Adaptation

Download Chapter 6: Adaptation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chapter 6: Adaptation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter 6: Adaptation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?