Brandeis MATH 56A - OPTIMAL STOPPING TIME - D1146277

Home> Schools> Brandeis University> Biological Physics (MATH) > MATH 56A> OPTIMAL STOPPING TIME

DOC PREVIEW

Brandeis MATH 56A - OPTIMAL STOPPING TIME

School name Brandeis University

Course Math 56a- Stochastic processes

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

102 OPTIMAL STOPPING TIME4. Optimal Stopping Time4.1. Definitions. On the first day I explained the basic problem usingone example in the book. On the second day I explained how thesolution to the problem is given by a “minimal superharmonic” andhow you could find one using an iteration algorithm. Also, a simplegeometric construction gives the solution for fair random walks. Onthe third day I explained the variations of the game in which thereis a fixed cost per move and on the fourth day I did the payoff withdiscount. I skipped the continuous time problem.4.2. The basic problem. I started with the example given in thebook: You roll a die. If you get a 6 you lose and get nothing. But ifyou get any other number you get the value on the die (1,2,3,4 or 5dollars). If you are not satisfied with what you get, you can roll overand you give up your reward. For example, if you roll a 1 you probablywant to go again. But, if you roll a 6 at any time then you lose: Youget nothing. The question is: When should you stop? The answerneeds to be a strategy: “Stop when you get 4 or 5.” or maybe “Stopwhen you get 3,4 or 5.” You want to chose the best “stopping time.”4.2.1. stopping time.Definition 4.1. In a stochastic process T is called a stopping time ifyou can tell when it happens.Basically, a stopping time is a formula which, given X1, X2, · · · , Xntells you whether to stop at step n. (Or in continuous time, given Xtfor t ≤ T , tells you whether T is the stopping time.)Some examples of stopping time are:(1) the 5th visit to state x(2) 10 minutes after the second visit to y .(3) the moment the sum X1+ X2+ · · · + Xnexceeds 100.You cannot stop right before something happens. In class we dis-cussed the scenario where you are driving on the highway and you areabout to have an accident. Is the second before the moment of impacta stopping time? Even if the probability is 1, you are not allowed tocall it a stopping time because “probability one” is not good enough.You have to use the information you have until that moment to decideif this is stopping time. For example, you could say, T is the momentyour car gets within 1 cm of the car in front of you. That would be astopping time.MATH 56A SPRING 2008 STOCHASTIC PROCESSES 1034.2.2. payoff function. The payoff function is a functionf : S → Rwhich assigns to each state x ∈ S a numb er f(x) ∈ R representing howmuch you get if you stop at state x. To figure out whether to stop youneed to look at what you can expect to happen if you don’t stop.(1) If you stop you get f(x).(2) If, starting at x, you take one step and then stop you expect toget!p(x, y)f(y)You need to analyze the game before you play and decide on analgorithm when to stop. (Or you have someone play for you and yougive them very explicit instructions when to stop an take the payoff.)This stopping time is T .XTis the state that you stop in.f(XT) is the payoff that you will get.You want to maximize f(XT).4.2.3. value function. The value function v(x) is the expected payoffusing the optimal strategy starting at state x.v(x) = E(f(XT) | X0= x)Here T is the optimal stopping time. You need to remember that thisis given by an algorithm based on the information you have up to andincluding that point in time.Theorem 4.2. The value function v(x) satisfies the equationv(x) = max(f(x),!yp(x, y)v(y))In this equation,f(x) = your payoff if you stop.!yp(x, y)v(y) = your expected payoff if you continue.Here you assume you are going to use the optimal strategy if you con-tinue. That is why you will get v(y) instead of f(y). When you comparethese two (f(x) and"p(x, y)v(y)), the larger number tells you whatyou should do: stop or play.The basic problem is to find the optimal stopping time T and calcu-late the value function v(x).104 OPTIMAL STOPPING TIMEExample 4.3. Suppose that you toss a die over and over. If you getx your payoff isf(x) =#x if x $= 60 if x = 6And: if you roll a 6 you lose and the game is over. I.e., 6 is recurrent.If X0is your first toss, X1your second, etc. the probability transitionmatrix is:P =1/6 1/6 1/6 1/6 1/6 1/61/6 1/6 1/6 1/6 1/6 1/61/6 1/6 1/6 1/6 1/6 1/61/6 1/6 1/6 1/6 1/6 1/61/6 1/6 1/6 1/6 1/6 1/60 0 0 0 0 1Since v(6) = 0, the second number in the boxed equation is the productof the matrix P and the column vector v:!yp(x, y)v(y) = P v(x) =16(v(1) + v(2) + v(3) + v(4) + v(5))(for x < 6). I pointed out that, since the first 5 rows of P are the same,the first 5 entries in the column vector P v are the same (and the 6thentry is 0).4.3. Solutions to basic problem. On the second day I talked aboutsolutions to the optimal stopping time problem and I explained:(1) minimal superharmonics(2) the iteration algorithm(3) solution for random walks4.3.1. minimal superharmonic.Definition 4.4. A superharmonic for the Markov chain Xnis a realvalued function u(x) for x ∈ S so thatu(x) ≥!y∈Sp(x, y)u(y)In matrix form the definition isu(x) ≥ (P u)(x)where u is a column vector.MATH 56A SPRING 2008 STOCHASTIC PROCESSES 105Example 4.5. Roll one die and keep doing it until you get a 6. (6 isan absorbing state.) The payoff function is:states x payoff f(x) probability P1 150 1/62 150 1/63 150 1/64 300 1/65 300 1/66 0 1/6The transition matrix in this example is actually 6 × 6 as in the firstexample. But I combined these into 3 states1: A = 1, 2 or 3, B = 4 or5 and C = 6:states x payoff f(x) probability PA 150 1/2B 300 1/3C 0 1/6Then, instead of a 6 × 6 matrix, P became a 3 × 3 matrix:P =1/2 1/3 1/61/2 1/3 1/60 0 1The best payoff function you can hope for is (the column vector)u = (300, 300, 0)twhere thetmeans transpose. (But later I dropped thet.) ThenP u =1/2 1/3 1/61/2 1/3 1/60 0 13003000=2502500Since 300 ≥ 250, u = (300, 300, 0) is superharmonic.Theorem 4.6. The value function v(x) is the minimal superharmonicso that v(x) ≥ f(x) for all states x.This doesn’t tell us how to find v(x). It is used to prove that theiteration algorithm converges to v(x).1You can combine two states x, y if:(1) f(x) = f (y) and(2) the x and y rows of the transition matrix P are identical.106 OPTIMAL STOPPING TIME4.3.2. iteration algorithm. This gives a sequence of superharmonicswhich converge to v(x). You start with u1which is the most opti-mistic. This the best payoff you can expect to get:u1(x) =#0 if x is absorbingmax f(y) if x is transientIn the example, max f(y) = 300 and C is

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 6 pages.

Brandeis MATH 56A - OPTIMAL STOPPING TIME

Sign up for free to view:

Please select your school