Chapter 6 Textbook Adaptation Assume static world before Learning 1 What is the general picture of adaptation and its relationship to communication Adaptive behavior o An action is taken the world responds to the action and the individual infers something about the world o Then adapts his behavior to secure desirable responses o Humans are adaptively rational Learn from trial and error o Rats in a maze cheese is message means turn here o A kind in trouble to get attention trouble is message that says attend to me o Someone or the world communicates to a person who then o Assume that human animals choose some alternative that following his choice he receives some kind of reward or penalty that in some way he notes the result and attributes it to his choice over time he reduces his propensity to choose alternatives that been followed by bad consequences and increases his propensity to choose alternatives that have good consequences learns and adapts Basic idea of model Common factors All involve the response of a human animal Response changes over time and adapts on the basis on experience Pleasurable vs unpleasurable consequences We expect these choice processes to increase the effectiveness of behavior in achieving individual goals Thus behavior is adaptively rational False learning when people learning in an apparently intelligent way come to believe things that are not true Basic Model Reinforcement learning o Mouse example 2 What is the basic model of the chapter T maze mouse wanders around exploring until it ends up in one of the two goal boxes All of the doors are one way once mouse goes through them it cannot go back The mouse also cannot see what is in a goal box until it enters it Food in right hand box and left hand box empty Put Alfred in starting box a two times At intersection where will he turn Turned left first second time turn right finds food Put food in right hand box and put Alfred in starting box and observe behavior in maze learning trials Alfred will occasionally turn left but as experiment continues he will gradually turn right more often and after many trails will turn right every time o Alfred has learning to manipulate the world Adapted to situation original behavior was random he learned that some kinds of actions bring pleasant rewards and now performs these actions Behavior turning left or right that is reinforced rewarded becomes more frequent whereas behavior not reinforced becomes less frequent A mouse is placed in a maze has choice to turn left or right If it goes left it receives food If it goes right it receives nothing The mouse is placed in the same maze several times The experimenter observes that rather rapidly it stops going right and always goes left Mouse has some initial disposition to turn right or left in maze o PR 0 initial probability of turning right before first trial o PL 0 initial probability of turning left before first trial 0 indicates that we are referring to mouse s initial probability probability before any training taken place o PR 0 and PL 0 are related if only observe two possible outcomes ends up in right or left box then probabilities must add to 1 Either right or left must occur and only one of them can occur at any one trial 3 Understand the Alfred example 4 How could we model the increment What are the possible models and which one works Concerned with adaptation o Adaptation means change in probabilities o Model of learning process must specify how these probabilities change over time o If only right hand box contains food and animal is capable of learning about his environment over time PR will increase and PL will decrease Pr t 1 Pr t increment Pr at time t 1 is related to Pr at time t Pr t 1 has increased and the amount of this increase amount learned as a result of trial is the increment Constant increment model o Assume Alfred happened to turn right initially was rewarded and that the learning increment for turning right is 0 2 Pr t 1 Pr t 0 2 If Alfred was originally netural in turning preference Pr 0 0 5 and happens to turn right on first trial then o Pr 1 Pr 0 0 2 0 5 0 2 0 7 So at beginning of second trial Alfred s probability of turning to right is 0 7 Suppose he happens to turn to right on second trial Pr 2 Pr 1 0 2 0 7 0 2 0 9 So after two trials Alfred almost completely adapted to situation 90 chance of turning correctly Apply adaptation equation to calculate Pr 3 probability is 1 1 which is impossible o Trouble with assumption is that learning increment equal to 0 2 so maybe 0 2 is too large o What we need is a different adaption equation one that stays within 0 1 probability range Any model with a constant increment no matter how small must eventually produce illegal probabilities o Possible modification When model predicts numbers greater than 1 or less than 0 we will interpret them as 1 or 0 Loses in terms of beauty but it s possible Constant increment model not consistent with most data Constant proportion Model o Quantity 1 Pr represents amount that Alfred has yet to learn about maze learn is 0 3 o If current probability of turning right is 0 7 then amount left to o If current probability of turning right were 1 then nothing left to learn about maze o Assume in each trial Alfred learns a constant proportion of the amount he has yet to learn Constant increment model o Fails because it implies probabilities outside the 0 1 range and because it does not fit the data well Constant proportion model o Does produce acceptable probabilities and acceptable predictions 5 What are the assumptions of the various adaptation equations Assumption 1 o Alternative behaviors for the individual Assume that any individual has some set of possible alternative behaviors in which he might engage The mouse may turn left or right Start with a list of mutually exclusive alternative behaviors Assumption 2 o State of the individual Assume individual is in some state with respect to the alternative behaviors at any point in time Assume the individual s state may be described by a set of probabilities one for each alternative behavior T maze the mouse has a probability of going left and probability of going right at any point in time these probabilities add up to 1 Initially going right left probability is 0 5 As adaptation precedes probabilities might change Assumption 3 o Alternative responses of the world The world has a repertoire of possible responses to the behavior of the individual Giving food giving water love etc
View Full Document