Hidden Markov Models: Modeling CpG I slands & Learning O verview A H i d d en M ark ov M od el i s a “g enerative” m od el. A g enerati v e m od el i s a m od el f or g enerati ng ob serv ed d ata f rom an u nd erly i ng param eteri z ed prob ab i li ty d i stri b u ti on. For ex am ple, w e c an g enerate a seq u enc e of L state-transi ti ons and an alph ab et-seq u enc e Lx f rom a H M M th at h as k states and m alph ab ets per state. Th e j oi nt prob ab i li ty of ),x(PLLπth en c orrespond s to th e “li k eli h ood ” of th i s seq u enc e of state-transi ti ons and th e alph ab ets i n i t. We are i nterested i n th ree prob lem s v i s-à-v i s H M M s: E valu ation Gi v en a seq u enc eLx, and a g enerati v e H M M m od el c orrespond i ng to th e seq u enc e, c h arac teri z ed b y param eters θ, h ow li k ely i s i t th at th e seq u enc e w as g enerated b y th i s m od el. Th i s i s th e prob ab i li ty )|x(PLθ . D ec oding Gi v en a seq u enc eLx, and a H M M m od el w i th param eters θ, w h at’s th e m ost li k ely state-transi ti on seq u enc e, i .e., )]|,x(P[maxargLθππ. Learning Gi v en a seq u enc eLx, and H M M m od el w i th u nspec i f i ed em i ssi on and transi ti on prob ab i li ti es f or th e m od el, h ow to esti m ate th ese prob ab i li ti es su c h th at )|x(PLθ i s m ax i m i z ed . 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1 x2 x3 x 2 1 K 2Alg orith ms V iterbi Th e V i terb i alg ori th m solv es th e d ec od i ng prob lem , i .e., i t c om pu tes )]|,x(P[maxargLθππ i n )Lk(O2 ti m e and )kL(Ospac e. F orw ard Th e Forw ard alg ori th m i s b ased on solv i ng th e f orw ard prob ab i li ty prob lem , i .e., f or a seq u enc e Lx, ),x,...,x,x(P)k(fkii21i=π=. A su m of th e f orw ard prob ab i li ti es g i v es a solu ti on to th e ev alu ati on prob lem , i .e.,∑=θkLL)k(f)|x(P , i n )Lk(O2 ti m e and)kL(Ospac e. B ac kw ard Th e B ac k w ard alg ori th m i s b ased on solv i ng th e b ac k w ard prob ab i li ty prob lem , i .e., i .e., f or a seq u enc e Lx, )|x,...,x,x(P)k(bkiL2i1ii =++π=. A su m of th e b ac k w ard prob ab i li ti es also g i v es a solu ti on to th e ev alu ati on prob lem , i .e.,∑=θkL1kk0L)k(b)x(ea)|x(P , i n )Lk(O2 ti m e and)kL(Ospac e. H ow ev er, th e pri m ary m oti v ati on f or th e b ac k w ard alg ori th m i s f i nd i ng “posterior state probability ”, i .e., th e prob ab i li ty th at ob serv ati on ixc am e f rom state k, or )x|k(Pi=π , w h i c h i s d ef i ned as)x(P)i(b)i(f)x|k(Pkki==π. We m ay b e i nterested i n th i s prob ab i li ty , e.g ., i f w e w ant to f i nd th e li k eli h ood a posteri ori ’ th at a d i sh onest c asi no play er h ad rolled a u nf ai r d i e on a parti c u lar th row . P osterior D ec oding Th e posteri or state prob ab i li ty serv es as an alternati v e to V i terb i d ec od i ng , w h i c h only reports th e state-transi ti ons c orrespond i ng to th e m ost li k ely path or “parse”. H ow ev er, w e c an also d ef i ne an alternate parse as )]x|k(P[maxargik*i=π=π. Th i s parse m ay not b e parti c u larly li k ely i n i t’s enti rety i n th e m od el, and m ay ev en b e i lleg i ti m ate i f c ertai n transi ti ons are not allow ed1, b u t i t g i v es a b etter “loc ali z ed ” m easu re i f w e are i nterested i n th e m ost li k ely state at posi ti on i. 1 Durbin’s BookCpG I slands – E xample of Modeling w ith HMMs Meth y lation & Gene R eg u lation I n th e g enom e of h u m ans and oth er E u k ary oti c org ani sm s, w h enev er th e d i nu c leoti d e CG oc c u rs, th ere i s a h i g h prob ab i li ty th at i t h as b een “meth y lated”. [Th e d i nu c leoti d e CG i s also ref erred to as CpG, to d i sti ng u i sh i t f rom th e b ase pai r C-G.] M eth y lati on ref ers to th e ad d i ti on of a m eth y l g rou p ( CH 3) i n C-nu c leoti d es. M eth y lati on i s a ty pe of c h em i c al m od i f i c ati on of th e D N A, w h i c h i s epi g eneti c , i .e., i t c an b e i nh eri ted w i …
View Full Document