Brandeis MATH 56A - MATH 56A: STOCHASTIC PROCESSES CHAPTER 4 - D2281381

Home> Schools> Brandeis University> Biological Physics (MATH) > MATH 56A> MATH 56A: STOCHASTIC PROCESSES CHAPTER 4

DOC PREVIEW

Brandeis MATH 56A - MATH 56A: STOCHASTIC PROCESSES CHAPTER 4

School name Brandeis University

Course Math 56a- Stochastic processes

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

MATH 56A: STOCHASTIC PROCESSESCHAPTER 44. Optimal stopping timeOn the first day I explained the basic problem using the examplein the book. On the second day I explained how the solution to theproblem is given by a “minimal superharmonic” and how you could findone using an iteration algorithm. Also, a simple geometric constructiongives the solution for fair random walks. On the third day I explainedthe variations of the game in which there is a fixed cost per move or ifthe payoff is discounted. I also explained the transition to continuoustime.4.1. The basic problem. The problem is to find a “stopping time”which optimizes the expected value of a payoff function. I think I gavethe same example as in the book: You roll a die. If you get a 6 you loseand get nothing. But if you get any other number you get the valueon the die (1,2,3,4 or 5 dollars). If the value is too low you can rollover. The question is: When should you stop? The answer needs to bea strategy: “Stop when you get 4 or 5.” or maybe “Stop when you get3,4 or 5.” You want the best “stopping time.”4.1.1. stopping time.Definition 4.1. In a stochastic process a stopping time is a time Twhich has the property that you can tell when it arrives. I.e., whetheror not T is the stopping time is determined by the information that youhave at time T .Basically, a stopping time is a formula which, given X1, X2, · · · , Xntells you whether to stop at step n. (Or in continuous time, given Xtfor t ≤ T , tells you whether T is the stopping time.)Some examples of stopping time are:(1) the 5th visit to state x(2) the smallest time T at which X1+ X2+ · · · + XT> 100.(3) the first visit to the set {3, 4, 5}.Date: October 22, 2006.12 MATH 56A: STOCHASTIC PROCESSES CHAPTER 4If T is the first visit to state x then T − 1 is not a stopping time.(You cannot say “stop right before you get to x.” since the process isstochastic and you can’t tell the future.)4.1.2. payoff function. The payoff function assigns to each state x ∈ Sa number f (x) ∈ R which can be positive or negative. This representswhat you gain (or lose) if you stop at state x . To figure out whether tostop you need to look at what you can expect to happen if you don’tstop.(1) If you stop you get f(x).(2) If, starting at x, you take one step and then stop you getXp(x, y)f(y)We assume that there is only one transient communication class andf(x) = 0 on all recurrent classes.4.1.3. value function. The value function v(x) is the expected payoffusing the optimal strategy starting at state x.v(x) = E(f(XT) | X0= x)Here T is the optimal stopping time. If you don’t know what T is thenyou need another equation:v(x) = maxTE(f(XT) | X0= x)This says you take all possible stopping times T and take the one whichgives the maximal expected payoff.Theorem 4.2. The value function v(x) satisfies the equationv(x) = max(f(x),Xyp(x, y)f(y))The basic problem is to find the optimal stopping time T and calcu-late the value function v(x).4.2. Solutions to basic problem. On the second day I talked aboutsolutions to the optimal stopping time problem. I started with anoutline:(1) Minimal superharmonic is optimal.(2) Iteration algorithm converges to minimal solution.(3) Random walks have concave solutions.(4) Solution for continuous time.I explained the solutions for discrete time, then converted these intosolutions for continuous time.MATH 56A: STOCHASTIC PROCESSES CHAPTER 4 34.2.1. minimal superharmonic.Definition 4.3. A superharmonic for the Markov chain Xnis a realvalued function u(x) for x ∈ S so thatu(x) ≥Xy∈Sp(x, y)u(y)In matrix form the definition isu(x) ≥ (P u)(x)where u is a column vector.Example 4.4. Roll one die and keep doing it until you get a 6. (6 isan absorbing state.) The payoff function is:states x payoff f(x) probability P1 150 1/62 150 1/63 150 1/64 300 1/65 300 1/66 0 1/6The transition matrix in this example is actually 6 × 6. But I sim-plified this to 3 states: A = 1, 2 or 3, B = 4 or 5 and C = 6:states x payoff f(x) probability PA 150 1/2B 300 1/3C 0 1/6Then P is a 3 × 3 matrix:P =1/2 1/3 1/61/2 1/3 1/60 0 1The best payoff function you can hope for is u = the column matrix(300, 300, 0). ThenP u =1/2 1/3 1/61/2 1/3 1/60 0 13003000=25025004 MATH 56A: STOCHASTIC PROCESSES CHAPTER 4The equation u(x) ≥ (P u)(x) means the x-entry of the matrix x is ≥the x-entry of the matrix P u. So, 300 ≥ 250 makes u = (300, 300, 0)superharmonic.Theorem 4.5. The value function v(x) is the minimal superharmonicso that v(x) ≥ f(x) for all states x.This gives a theoretical solution which is useful in some cases (suchas the simple random walk).4.2.2. iteration algorithm. As I explained it, u(x) is your estimatedexpected payoff. The algorithm works like this. You start with u1which is the most optimistic. This the payoff you get if you cheat onthe next roll.u1(x) =(0 if x is absorbingmax f(y) if x is transientNext, u2is your exp ec ted payoff if you play fair for one round andthen cheat. unis your payoff if you wait n turns before cheating. Therecursive formula for un+1given unisun+1(x) = max(f(x), (P un)(x))At each stage unis superharmonic and un(x) ≥ f(x) but the valuesget smaller and smaller and become minimal in the limit:v(x) = limn→∞un(x)v(x) is your expected payoff if you put off cheating indefinitely.In the example,u1= (300, 300, 0)u2= (250, 300, 0)u3= (225, 300, 0)u4= (212.5, 300, 0)v = u∞= (x, 300, 0)where x = 200 is the solution of the equation:x =x2+30034.2.3. convex value function. Suppose you have a simple random walkwith absorbing walls. Then a function u(x) is superharmonic ifu(x) ≥u(x − 1) + u(x + 1)2In other words, the point (x, u(x)) is above the point which is midwaybetween (x − 1, u(x − 1)) and (x + 1, u(x + 1)). So, superharmonic isMATH 56A: STOCHASTIC PROCESSES CHAPTER 4 5the same as convex (concave down). So, the theorem that the valuefunction v(x) is the minimal superharmonic so that v(x) ≥ f(x) meansthat the graph of v(x) is the convex hull of the graph of f (x).4.2.4. continuous time. In a continuous Markov chain you have an in-finitesmal generator A which is a matrix with transition rates α(x, y)which are all nonnegative except for α(x, x) = −α(x). Since the rowsadd to zero we haveα(x) =Xy6=xα(x, y)So, you get a probability matrix P with entriesp(x, y) :=α(x, y)α(x)for x 6= y (and p(x, x) = 0). This is the probability of first jumping toy from x:p(x, y)

View Full Document