New version page

# Berkeley CS 188 - Exam Prep 9 Solutions

Pages: 7
Documents in this Course

## This preview shows page 1-2 out of 7 pages.

View Full Document

End of preview. Want to read all 7 pages?

View Full Document
Unformatted text preview:

CS 188Fall 2019 Exam Prep 9 SolutionsQ1. HMMsConsider a process where there are transitions among a finite set of states s1, · · · , skover time steps i = 1, · · · , N.Let the random variables X1, · · · , XNrepresent the state of the system at each time step and be generated asfollows:• Sample the initial state s from an initial distribution P1(X1), and set i = 1• Repeat the following:1. Sample a duration d from a duration distribution PDover the integers {1, · · · , M}, where M is themaximum duration.2. Remain in the current state s for the next d time steps, i.e., setxi= xi+1= · · · = xi+d−1= s (1)3. Sample a successor state s0from a transition distribution PT(Xt|Xt−1= s) over the other statess06= s (so there are no self transitions)4. Assign i = i + d and s = s0.This process continues indefinitely, but we only observe the first N time steps.(a) Assuming that all three states s1, s2, s3are different, what is the probability of the sample sequences1, s1, s2, s2, s2, s3, s3? Write an algebraic expression. Assume M ≥ 3.p1(s1)pD(2)pT(s2|s1)pD(3)p(s3|s2)(1 − pD(1)) (2)At each time step i we observe a noisy version of the state Xithat we denote Yiand is produced via a conditionaldistribution PE(Yi|Xi).(b) Only in this subquestion assume that N > M. Let X1, · · · , XNand Y1, · · · , YNrandom variables definedas above. What is the maximum index i ≤ N − 1 so that X1⊥⊥ XN|Xi, Xi+1, · · · , XN−1is guaranteed?i = N − M(c) Only in this subquestion, assume the max duration M = 2, and PDuniform over {1, 2} and each xiisin an alphabet {a, b}. For (X1, X2, X3, X4, X5, Y1, Y2, Y3, Y4, Y5) draw a Bayes Net over these 10 randomvariables with the property that removing any of the edges would yield a Bayes net inconsistent with thegiven distribution.(X1) at (0,0) X1; (X2) at (2,-2) X2; (X3) at (4,0) X3; (X4) at (6,-2) X4; (X5) at (8,0) X5; (Y1) at (0,-4)Y1;(Y2) at (2,-4)Y2; (Y3) at (4,-4)Y3; (Y4) at (6,-4)Y4; (Y5) at (8,-4)Y5; (X1) – (X2);(X2) – (X3);(X3) –(X4);(X4) – (X5);(X1) – (Y1);(X2) – (Y2);(X3) – (Y3);(X4) – (Y4);(X5) – (Y5);(X1) – (X3);(X2) –(X4);(X3) – (X5);(d) In this part we will explore how to write the described process as an HMM with an extended state space.Write the states z = (s, t) where s is a state of the original system and t represents the time elapsed in thatstate. For example, the state sequence s1, s1, s1, s2, s3, s3would be represented as (s1, 1), (s1, 2), (s1, 3), (s2, 1), (s3, 1), (s3, 2).Answer all of the following in terms of the parameters P1(X1), PD(d), PT(Xj+1|Xj), PE(Yi|Xi), k (totalnumber of possible states), N and M (max duration).1(i) What is P (Z1)?P (x1, t) =(P1(x1) if t = 10 o.w.(3)(ii) What is P (Zi+1|Zi)? Hint: You will need to break this into cases where the transition function willbehave differently.P (Xi+1, ti+1|Xi, ti) =PD(d ≥ ti+ 1|d ≥ ti) when Xi+1= Xiand ti+1= ti+ 1 and ti+1≤ MPT(Xi+1|Xi)PD(d = ti|d ≥ ti) when Xi+16= Xiand ti+1= 10 o.w.Where PD(d ≥ ti+ 1|d ≥ ti) = PD(d ≥ ti+ 1)/PD(d ≥ ti).Being in Xi, ti, we know that d was drawn d ≥ ti. Conditioning on this fact, we have two choices, if d > tithenthe next state is Xi+1= Xi, and if d = tithen Xi+16= Xidrawn from the transition distribution and ti+1= 1.(4)(iii) What is P (Yi|Zi)?p(Yi|Xi, ti) = PE(Yi|Xi)2(e) In this question we explore how to write an algorithm to compute P (XN|y1, · · · , yN) using the particularstructure of this process.Write P (Xt|y1, · · · , yt−1) in terms of other factors. Construct an answer by checking the correct boxesbelow:P (Xt|y1, · · · , yt−1) = (i) (ii) (iii)(i) Pki=1PMd=1PMd0=1#Pki=1PMd=1#Pki=1#PMd=1(ii) # P (Zt= (Xt, d)|Zt−1= (si, d))# P (Xt|Xt−1= si)# P (Xt|Xt−1= sd) P (Zt= (Xt, d0)|Zt−1= (si, d))(iii) # P (Zt−1= (sd, i)|y1, · · · , yt−1)# P (Xt−1= sd|y1, · · · , yt−1) P (Zt−1= (si, d)|y1, · · · , yt−1)# P (Xt−1= si|y1, · · · , yt−1)(iv) Now we would like to include the evidence ytin the picture. What would be the running time ofeach update of the whole table P (Xt|y1, · · · , yt)?. Assume tables corresponding to any factors usedin (i), (ii), (iii) have already been computed.# O(k2)# O(k2M) O(k2M2)# O(kM)Note: Computing P (XN|y1, · · · , yN) will take time N× your answer in (iv).Just the running time for filtering when the state space is the space of pairs (xi, ti),Given Bt−1(z), the step p(zt|y1, · · · , yt−1) can be done in time kM. (size of the statespace for z).The computation to include the ytevidence can be done in O(1) per zt.Therefore each update to the table per evidence point will take (Mk)2. So it is O((M k)2).Using N steps, the whole algorithm will take O(N k2M2) to compute P (XN|Y1, · · · , YN).(v) Describe an update rule to compute P (Xt|y1, · · · , yt−1) that is faster than the one you discoveredin parts (i), (ii), (iii). Specify its running time. Hint: Use the structure of the transitionsZt−1→ Zt.Answer is O(k2M + kM ).The answer from the previous section is:P (Xt|y1, · · · , yt−1) =kXi=1MXd=1MXd0=1P (Zt= (Xt, d0)|Zt−1= (si, d))P (Zt−1= (si, d)|y1, · · · , yt−1) (5)To compute this value we only really need to loop through those transitions P (Zt= (Xt, d0)|Zt−1=(si, d)) that can happen with nonzero probability.For all Xt= s we need to sum over all factors of the form P (Zt= (s, d0)|Zt−1= (si, d))P (Xt−1=si|yi, · · · , yt−1). For a fixed s the factor P (Zt= (Xt, d0)|Zt−1= (si, d)) can be nonzero only whensi= s and d0= d + 1 (M tuples). And when si6= s and d0= 1 and d = 1, · · · , M (kM tuples).Since this needs to be performed for all k possible values of s, the answer to update the whole tableis O(k2M + kM ).3Q2. Particle Filtering ApprenticeshipConsider a modified version of the apprenticeship problem. We are observing an agent’s actions in an MDPand are trying to determine which out of a set {π1, . . . , πn} the agent is following. Let the random variableΠ take values in that set and represent the policy that the agent is acting under. We consider only stochasticpolicies, so that Atis a random variable with a distribution conditioned on Stand Π. As in a typical MDP, Stis a random variable with a distribution conditioned on St−1and At−1. The full Bayes net is shown below.The agent acting in the environment knows what state it is currently in (as is typical in the MDP setting).Unfortunately, however, we, the observer, cannot see the

View Full Document Unlocking...