Unformatted text preview:

CHAPTER 6ADAPTIVE BEHAVIORAn action is taken; the world responds to the action; and the individual infers something about the world and then adapts his behavior so as to secure desirable responsesconsider humans as adaptively rational- assumed to learn from trial and errorall examples involve a response of human or animalresponse changes over timehuman experiment w/ various alternatives and choose some more often then others b/c of the pleasurable/unpleasurable consequences they’ve experienced following the choicechoice processes increase effectiveness of behavior in achieving individual goals“False Learning”: when people learning in an apparently intelligent way come to believe things that are not trueTHE BASIC MODELReinforcement Learning – main adaptation process focused on in chapterObserve adaptive behavior of an animal in a maze (T-maze because of shape of T)Mouse placed in starting box at head of maze – wanders around—ends up in 1 of 2 goal boxesAll doors are 1-way: once mouse goes thru them it cant go back outMouse cannot see whats in goal box until it enters itALFREDPut food in right hand goal box and left the left goal box empty – Alfred then placed in starting box w/ no idea that there is food in the right goal boxFirst trial- Alfred eventually goes left after some timeHalf an hour later- leaves quickly and eventually realizes to turn right instead of left to discover foodEvery half hour put food in right-hand goal box and put Alfred in starting box and observed behavior= learning trialsAfter more and more trials eventually turns right each timeOriginal behavior = random until he learned that some kinds of actions bring pleasant rewards – now performs those actions insteadOverall had adapted to the situationBehavior (turning left or right) that is reinforced (rewarded) becomes more frequent whereas behavior that is not rewarded becomes less frequentApplies to humans as wellEx- mother teaching child the alphabet  holds up letter, child makes sound, mother smiles  if incorrect , no smileSubject= childBehavior= sound child makesReward= mother’s smilePr (0) = initial probability of turning right before 1st trialPl (0) = initial probability of turning left before 1st trialOvertime, Pr will increase and Pl will decrease b/c only right-goal box has food and animal is capable of learning the environmentPr (t+1) = Pr (t) + some incrementPr at time t+1 is related to Pr at tie tPr (t+1) has increased and that the amnt of this increase (amnt that has been learned as result of trial) is incrementA Constant Increment Model: (fail) assume Alfred happed to turn right initially, was rewarded, and that learning increment for turning right is 0.2…Pr (t+1) = Pr (t) + 0.2If Alfred was originally neutral in turning preference, that is Pr(0)= 0.5 and if he happens to turn right on 1st trial then,Pr (1)= Pr(0) + 0.2=0.5 +0.2 = 0.7At beginning of 1st trial Alfreds probability of turning right =0.7If he turns right at 2nd trial…Pr(2) = Pr (1)+0.2=0.7+02=0.9After 2 trials Alfred has almost completely adapted to situation- 90% chance of turning correctly (right)Pr (3) = Pr (2)+.2= .9+.2= 1.1 which is impossibleTrouble w/ assumption: learning increment =.2 which could be too large and making increment smaller will still lead to impossible numbers in larger trialsYou need a different adaptation equation that stays within 0-1 probability rangeNeed a model w/ variable incrementA Constant Proportion Model: quantity (1-Pr) represent amount that Alfred has yet to learn about mazeCurrent probability of turning right is 0.7, then amount he would have left to learn is 0.3 (1-.7)Assume that in each trial Alfred learns a constant proportion of the amount he has left to learna = learning proportion (rate) then increment is a(1-Pr)suppose Alfred’s initial chance of turning right is 0.5, and learning rate (a) is 0.3 and he turns right the 1st time…Pr (1)= Pr (0) + incrementPr(1)=Pr(0)+a [1-Pr(0)]=0.5+0.3 (1-0.5)=0.5+0.15=0.65Alfred again makes correct turn- right, on 2nd trialPr(2)=Pr(1)+ a[1-Pr(1)]= .65 + .3 (1-.65)= 0.755Learning rate increment is always positive b/c he learns something every trialIncrement gets smaller as he gets closer to 100%correct behaviorGeneral Principle of Learning= behavior that is reinforced teds to become more frequent, or probable6 ASSUMPTIONS OF OUR MODEL1. Alternative behaviors for the individualmay turn left or right2. State of the individualprobabilities add up to 13. Alternative responses of the worldcheese, no cheese4. State of the worldworld has some set of rules for own behaviorprobabilities of various responses to each behaviorright= 70% cheese5. Set of Possible Eventscombination of possible behaviors crossed by possible world responsego left- no reward, go right- reward6. Adaptation Equationsspecific equation for each possible eventevent 1= left/no rewardADAPTATION EQUATIONSE1= Left and Reward  Pl (t+1)=Pl(t)+ a[1-Pl(t)]E2= Left and No Reward  Pl (t+1)=Pl(t) – bPl(t)E3= Right and Reward Pr (t+1)=Pr(t)+ a[1-Pr(t)]E4=Right and No Reward  Pr (t+1)=Pr(t)- bPr(t)a= learning rate associated with reward behaviornumber in the range 0-1shows rate of response to reinforcementlow values- slow learning, high value- fast learningb=learning rate associated with nonrewardrange of possible values and its interpretation are identical to that of athese 4 equations have few simple principles:behavior that is reinforced becomes more probable and behavior that is not reinforced becomes less probablerewards cause behavior to change at rate a; nonrewards cause behavior to change at rate bamount of adaptation on any trial is always a constant fraction (a or b) of the amount left to be learnedMEANING OF PROBABILITY IN ADAPTATION MODELSProbability assumption is a simplification of the real worldWe only need to observe behaviorUse probability as shorthand aggregation of individuals complexityWhat matters is that the worlds behavior is not constantSIGNIFICANCE OF A AND Ba = learning rate associated with rewarded behaviorb = learning rate associated with non-rewardboth numbers between 0 and 1small values=slow changeshigher values=rapid changesmotivation- used to summarize the attributes of a reward or the current state of the subject that make a given reward important to a given subject at a specific timelearning ability- ability of the subject to draw inferences and modify behavior on basis of experienceif ability or motivation vary from situation to


View Full Document

UMD COMM 402 - CHAPTER 6

Download CHAPTER 6
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CHAPTER 6 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CHAPTER 6 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?