Unformatted text preview:

Chapter 6: AdaptationAdaptation= Learning before we assumed that people weren’t learning anything, the world was static.Previous chapters about “calculated rationality” – some sort of choice in a static systemHere, we assume that person acts, the world reacts, and then the person needs to (choose to) adapt to world’s response we’ve been assuming that both of the two parties weren’t reacting to the first trial. But no we get into the reality of interaction with history. Example, consider history of friendship when making choices.Thus, “adaptively rational”Adaptation and CommunicationRats in a maze: the cheese is a message that means “turn toward here” rat is assumed to be working in a T maze. Probability of going left and right, cheese is reward. Studying how we learn. The cheese is the message, rat receives the message and searches for reward. Rat arranges his life so he gets more rewards. Ex. You see what things get you dumped, and you quit doing that.A kid gets in trouble to get attention: the trouble is a message that says “attend to me” There are all other kinds of applications, you teach your bf do this and not that by kissing, ect.Someone, or the world, communicates to a person, who then learns and adaptsTeaching, persuading, influencing, attracting – does all comm ask for adaptation by the receiver?Basic ModelReinforcement learning is main topic of the chapter you are rewarded for one choice, not rewarded for another. Simplified further, no punishments. Also known as operant behaviorAlternative learning models: S-R learning, modeling behavior, etc.Ex of rat, Alfred, in mazeInitially random behFinds some reinforcementAdapts beh to the prospect of rewardAlfredTrial and error learningBehavior becomes less and less randomAt first,pR (0) = pL (0) = 50% you absolutely have to keep track on what trial you are on. You are on trial 0 and you are getting ready for trial 1.pR is probability of a right turn(0) refers to time zero, before the first trialSince the food is always on the right, over time, the probability of turning right increases since its always on the right, Alfred will learn to turn right, and probability of turning right will increase. His probability at trial 0 is 50%, but that will rise.pR (t + 1) = pR (t) + some increment because the probility will increase, that is the increment. After you find cheese you are more likely to go right again so we’ll add some increment to the equation.So we can understand our problem as needing to model the increment we want to model that and know how quickly he adapts.Modeling the IncrementModel 1: a constant increment model bad model, we wont use it.Suppose the increment is .2 and it goes up evenly.PR (0) = .50PR (1) = .70 prob on trial 1, in anticipation of trial 2.PR (2) = .90PR (3) = 1.10 prob on trial 3, in anticipation of trial 4.Well it looked reasonable for a while so we have to do something else other than adding a constant increment. Constant increment model is bad bc it leads to impossible predictionsModel 2: A constant proportion model this is the good model that we will actually useIf the cheese is always on the right, then 1- prob of going right at time t. this is WHAT HE HAS LEFT TO LEARN, what he has REMAINING TO LEARN, it’s the AMOUNT OF ERROR.[1 - PR (t)]represents how much Alfred has yet to learnSuppose we model him this way: at each trial, Alfred learns a constant proportion of what he has left to learn every time he learns he reduces by some PROPORTATION of what he has left to learn. We are tyring to find the incrementModeling the IncrementModel 2 requires some constant, a (they will give to us), that represents the proportion Alfred learns on each trialPR (1) = PR (0) + a [1 - PR (0)]This means: the probability of going right on some trial = that probability from previous trial plus the rate of learning (a) what had yet to be learned on the prior trial this means the next trials probability is the same as the one before, plus an increment. Little a is measurable, and it wont be the same from person to person. Some people learn faster than others. Some people get a reward and do that over and over, some people get a reward and then think they could get a bigger reward by doing something different next time.Model 2 never gives a result where the probability is > 1 This model never gives a result where the probability is greater than 1. You will never get all the way up to 100%.Since a is a fraction, at each point you just add a fraction of what was left to learnProduces an asymptotic curve, where Alfred’s behavior approaches PR = 1P. 255 The asymptoteModeling AdaptationModel 2 fits observed results reasonably wellPredicts:Learning occursProbabilities rise faster at the start of learning you have more to correct at the beginning, and you will correct a fraction of it more every time.Stupid choices remain possible, though decreasingly likely rat can say oh what the hell lets go to the leftWe can measure learning rate, a how long does it take us to learn? It will be important in applications.Adaptation EquationsExpanding on model 2, the constant proportion model6 assumptions specifiedAssumptionsAlternate behs available (eg, Left, Right)State of individ described by probabilities that add up to 1 so if your prob of going right is 40% than your prob of going left is 60%World responds differently to various behs (eg, Cheese, No Cheese) the world is going to respond to you, and it will reward some things you do and not reward other things you do. And we are ignoring the possibility of punishmentsState of world described by probabilities of various responses to each beh (eg, Right = 70% Cheese) so far we’ve had cheese on the right 100% of the time, but as we move forward there could be cheese on the left 30% of the time, or even cheese on both sides.Set of possible events is combination of possible behs crossed by possible world responses (eg, Left/No Reward, Left/Reward)4 possibilies: left and cheese, left and no cheese, right and cheese, right and no cheese.Specify adaptation equations for each possible event (eg, event 1 = Left/No Reward) there are 4 possibilies, so there will be 4 equations.Adaptation EquationsSo the model generates sets of equations, and these describe the learning/adaptation the process will show how the probabilities will change over time. Eventually there will be a 99% the mouse will go to right, and it started at 50/50.Accomplished by explaining how the


View Full Document

UMD COMM 402 - Chapter 6: Adaptation

Download Chapter 6: Adaptation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chapter 6: Adaptation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter 6: Adaptation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?