September 17, 2008Larry KarpARE 2631 Ant ic ipate d Lear n i n g"Active learning" describes the situation where the decisionma ker (DM ) ma -nipulates a state variable in order to gain informa tion . For example, amon opoly migh t believe that the stoc k of loyal customers (a state variable)c h anges as a result of adv ertising, but not kno w the exact relation betweenadvertisingandthechangeinthestock,i.e. themonopolymightnotknowa parameter value in the equation of motion. The monopoly might chooseits adv ertisin g lev e l (a control variable) ov e r time in order to learn about theunkno wn but fixed param eter value. The model in the Clark e and Mangel,where an animal decides which patc h to search for food, con tains anotherexampleofactivelearning."Passive learning" describes the situation where learning is exogenou s tothe DM’s actions. For example, a DM will not manipulate the stoc k of GHGin order to learn more about the relation between these stoc ks and climatec h an ge. I will only discuss passive learning; but see section 1.3.3.Modeling learning requires using a state variable to describe inform a tion .I will discuss two t y pes of approac h es, using either a discrete or a con tinuousdistribution. In both cases I use examples rather than a general frame-work, in order to make the ideas as clear as possible and keep notation to aminimum.Throu ghou t this section, a central assumption is that the DM anticipateslearning. If the DM happens to learn, but does not anticipate that he willlearninthefuture,theproblemisverydifferent(andmuchsimplerandlessinteresting).1.1 Dis cr e te distrib u tio nSuppose that the single period pa y off isU (c) − ∆d(S)1Where S is the stoc k of GHG (a state variable) in the current period andc is emissions of GHG (the control variable) in a period. The benefitofemissions is U and the dam ages associated with the stoc k is ∆d,whered isa known function and ∆ is unkno wn. Section 1.2.1 considers the case where∆ is a random variable, and we learn about its distribution. Section 1.2.2considers the case where ∆ is a fixed parameter and w e obtain informationabout the value of this parameter.The stock of GHGs ev olves according toS0= f(S, c) (1)where f is the growth equation and S0is the stoc k in the next period. Recallthat I use the con v ention that a variable without a time subscript is the valueof the variable in the current period, and the variable with a prime is thevalue of that variable in the next period.1.2 Learning about ∆First, I will discuss two w a ys to think about learning. We can either assumethat ∆ is a rando m variable and we learn about its distribu tion, or we cantreat ∆ as a fixed but unkno w n n umber, w hic h w e learn about over time.Then I will consider t wo w ays of solving the problem (using either model oflearning): either dynam ic progr am m ing or stocha stic programming.1.2.1 ∆ is a random variable with unkno wn distributionFirst,supposethatwetreat∆ as a random variable. We do not kno wthe distributio n of ∆ but (for the purpose of modeling) w e think that it isreasonable to assume that it is a draw from one of two distributions. Theproblem is to choose the control rule for emissions, taking in to account thatin the future we will hav e better information than toda y concerning whichof the two possible distrib utions is correct.Denote these t wo distributions as x1and x2. A ssociated with eac h dis-tribution there are two possible outcomes, G and B.Underthefirst distri-bution the probabilit y is q tha t the realization of ∆ is G; under the seconddistribution the probability is r that the realization of ∆ is G. G and B arenumbers, with G<B. The two realizations correspond to lo w damage (G)and the high damage (B) outcom es.2Table 1 gives the outcom e s and probabilities associated w ith these twodistributions. These are conditiona l probabilities, the proba bility of an ev e ntconditio nal on x.realizatio n of ∆ x = x1x = x2G q rB 1 − q 1 − rTable 1: Th e conditional distributions of rando m variable ∆If q<rthen GHG stocks presen t a greater danger if the "truth" is x = x1rather than x = x2.For the purpose of the model, q and r are taken to be objectiv e probabil-ities. The DM is uncertain wh ich distrib ution is correct, and at time t hassubjective probabilit y ptthat x = x1The subjective probability at time tthat x = x2is therefore 1−pt.DenoteP (pt, ∆t) as the posterior probabilit y,the subjectiv e probability that x = x1when the prior is ptand you observe∆ = ∆t:pt+1= P (pt, ∆t)=Pr{x = x1| ∆t,pt} . (2)UsingBayes’RulewecanwritethevaluesofP (pt, ∆t) for the t wo possiblerealization s of ∆:1P (pt,G)=ptqptq +(1− pt) r(4)P (pt,B)=pt(1 − q)pt(1 − q)+(1− pt)(1− r). (5)In this model, the subjective probability, p is a state variable, th e equationof m o tio n of which is given by equation 2 . N ote that th e ev olu tion of pis stochastic, since the evolution depends on the realization of a randomvariable, ∆. Also note that the evolution does not depend on actions thatthe DM takes; learning is passive.1To obtain equation 4 and 5, use the ruleP (A ∩ B)=P (A | B) P (B)=P (B | A) P (A)to writeP (B | A)=P (A | B) P (B)P (A). (3)Associate the event A with ∆t= G and the event B with x = x1.Equations4and5then follow directly from the formula 3.3Increasin g the number of possible outcomes of ∆ does not greatly increasethesizeoftheproblem—itjustmeansthatwehavemorepossibleoutcomes,i.e. mor e equations of the form of equations 4 and 5. In contrast, increas-ing the n umber of possible distributions increases the dimensionality of thestate variable, and significantly increa ses the size of the problem (which hasimportant implica tio n s on the feasibility of obtaining a n u m eric al solution ).If there are n possible distributions you need n − 1 state variables to describebeliefs; each state variable is the subjective probab ility that a particular dis-tribution describes the world. S ince the probabilities sum to 1, you onlyneed n − 1 numbers to keep trac k of the n prob a bilities.1.2.2 The "star information structure"Kolstad (JEEM 1996) uses an alternative called the "star information struc-ture". In this setting ∆ is a parameter (rather than a random variable) thattakes a particular value, either G or B,buttheDMdoesnotknowwhichvalue it tak es. Let g beasignalthatmakesmethinkitismorelikelythat∆ = G,andb be a sign al that m akes me think it is more lik
View Full Document