11School of Computer ScienceHidden Markov Model and Conditional Random FieldsProbabilistic Graphical Models (10Probabilistic Graphical Models (10--708)708)Lecture 12, Oct 29, 2007Eric XingEric XingReceptor AKinase CTF FGene GGene HKinase EKinase DReceptor BX1X2X3X4X5X6X7X8Receptor AKinase CTF FGene GGene HKinase EKinase DReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8Reading: J-Chap. 12, and addition papersEric Xing 2Typical structure of a gene2Eric Xing 3E0E1E2Epoly-A3'UTR5'UTRtEiEsI0I1I2intergenicregionForward (+) strandReverse (-) strandForward (+) strandReverse (-) strandpromoterGAGAACGTGTGAGAGAGAGGCAAGCCGAAAAATCAGCCGCCGAAGGATACACTATCGTCGTCCTTGTCCGACGAACCGGTGGTCATGCAAACAACGCACAGAACAAATTAATTTTCAAATTGTTCAATAAATGTCCCACTTGCTTCTGTTGTTCCCCCCTTTCCGCTAGCGGAATTTTTTATATTCTTTTGGGGGCGCTCTTTCGTTGACTTTTCGAGCACTTTTTCGATTTTCGCGCGCTGTCGAACGGCAGCGTATTTATTTACAATTTTTTTTGTTAGCGGCCGCCGTTGTTTGTTGCAGATACACAGCGCACACATATAAGCTTGCACACTGATGCACACACACCGACACGTTGTCACCGAAATGAACGGGACGGCCATATGACTGGCTGGCGCTCGGTATGTGGGTGCAAGCGAGATACCGCGATCAAGACTCGAACGAGACGGGTCAGCGAGTGATACCGATTCTCTCTCTTTTGCGATTGGGAATAATGCCCGACTTTTTACACTACATGCGTTGGATCTGGTTATTTAATTATGCCATTTTTCTCAGTATATCGGCAATTGGTTGCATTAATTTTGCCGCAAAGTAAGGAACACAAACCGATAGTTAAGATCCAACGTCCCTGCTGCGCCTCGCGTGCACAATTTGCGCCAATTTCCCCCCTTTTCCAGTTTTTTTCAACCCAGCACCGCTCGTCTCTTCCTCTTCTTAACGTTAGCATTCGTACGAGGAACAGTGCTGTCATTGTGGCCGCTGTGTAGCTAAAAAGCGTAATTATTCATTATCTAGCTATCTTTTCGGATATTATTGTCATTTGCCTTTAATCTTGTGTATTTATATGGATGAAACGTGCTATAATAACAATGCAGAATGAAGAACTGAAGAGTTTCAAAACCTAAAAATAATTGGAATATAAAGTTTGGTTTTACAATTTGATAAAACTCTATTGTAAGTGGAGCGTAACATAGGGTAGAAAACAGTGCAAATCAAAGTACCTAAATGGAATACAAATTTTAGTTGTACAATTGAGTAAAATGAGCAAAGCGCCTATTTTGGATAATATTTGCTGTTTACAAGGGGAACATATTCATAATTTTCAGGTTTAGGTTACGCATATGTAGGCGTAAAGAAATAGCTATATTTGTAGAAGTGCATATGCACTTTATAAAAAATTATCCTACATTAACGTATTTTATTTGCTTTAAAACCTATCTGAGATATTCCAATAAGGTAAGTGCAGTAATACAATGTAAATAATTGCAAATAATGTTGTAACTAAATACGTAAACAATAATGTAGAAGTCCGGCTGAAAGCCCCAGCAGCTATAGCCGATATCTATATGATTTAAACTCTTGTCTGCAACGTTCTAATAAATAAATAAAATGCAAAATATAACCTATTGAGACAATACATTTATTTTATTTTTTTATATCATCAATCATCTACTGATTTCTTTCGGTGTATCGCCTAATCCATCTGTGAAATAGAAATGGCGCCACCTAGGTTAAGAAAAGATAAACAGTTGCCTTTAGTTGCATGACTTCCCGCTGGAT4321 =)|•(θθθθypGENSCAN (Burge & Karlin),)|(, jiitjtayyp ===−111Transition probabilities:Eric Xing 4Hidden Markov Model revisitz Transition probabilities between any two statesorz Start probabilities z Emission probabilities associated with each stateor in general:A AA Ax2x3x1xTy2y3y1yT... ... ,)|(, jiitjtayyp ===−111().,,,,lMultinomia~)|(,,,I∈∀=−iaaayypMiiiittK2111().,,,lMultinomia~)(MypπππK211().,,,,lMultinomia~)|(,,,I∈∀= ibbbyxpKiiiittK211().,|f~)|( I∈∀⋅= iyxpiittθ13Eric Xing 5Applications of HMMsz Some early applications of HMMsz finance, but we never saw them z speech recognition z modelling ion channels z In the mid-late 1980s HMMs entered genetics and molecular biology, and they are now firmly entrenched.z Some current applications of HMMs to biologyz mapping chromosomesz aligning biological sequencesz predicting sequence structurez inferring evolutionary relationshipsz finding genes in DNA sequenceEric Xing 6The Forward Algorithmz We want to calculate P(x), the likelihood of x, given the HMMz Sum over all possible ways of generating x:z To avoid summing over an exponential number of paths y, define(the forward probability)z The recursion:),,...,()(def111====kttktktyxxPyαα∑−==ikiitkttktayxp,)|(11αα∑=kkTPα)(x∑∑∑∑∏∏==−==yyxx121121yy yTtTtttyyyNttyxpapp )|(),()(,πL4Eric Xing 7The Backward Algorithmz We want to compute ,the posterior probability distribution on the t thposition, given xz We start by computingz The recursion:)|( x1=ktyPForward, αtkBackward, ),...,,,,...,(),(TtkttktxxyxxPyP1111+=== x)|...()...(),,...,|,...,(),,...,(,111111111======++ktTtkttkttTtkttyxxPyxxPyxxxxPyxxP)|,...,( 11==+ktTtktyxxPβitittiikktyxpa1111+++==ββ)|(∑,A Axt+1xTyt+1yT... Axtyt... ... ... Eric Xing 8The Junction tree algorithmz A junction tree for the HMMz Rightward passz This is exactly the forward algorithm!z Leftward pass …z This is exactly the backward algorithm! A AA Ax2x3x1xTy2y3y1yT... ... ),(11xyψ),(21yyψ),(32yyψ),(TTyy1−ψ),(22xyψ),(33xyψ),(TTxyψ)(2yζ)(3yζ)(Tyζ)(1yφ)(2yφ⇒⇒),(1+ttyyψ)(ttty→−1µ),(11 ++ttxyψ)(11 ++→tttyµ)(1+↑ttyµ∑++↑++←+←−=111111tyttttttttttyyyyy )()(),()(µµψµ∑∑→−++++→−++==ttttytttyyttytttttttyayxpyxpyyyp)()|()|()()|(, 11111111µµ∑+↑→−+++→=tyttttttttttyyyyy )()(),()(11111µµψµ∑+++++←+=111111tytttttttyxpyyyp)|()()|(µ),(1+ttyyψ)(ttty←−1µ)(11 ++←tttyµ),(11 ++ttxyψ)(1+↑ttyµ5Eric Xing 9Summary of the F-B algorithm),,,...,()(def1111===−→−ktttttktyxxxPkµα)|,...,()(def111===+←−ktTtttktyxxPkµβ)|()|()()(),,(:def,ttttjtttitttTjtitjityypyxpyyxyyp1111111111++++←→−==∝===µµξ∑−==ikiitkttktayxp,)|(11ααitittiikktyxpa1111+++==ββ)|(∑,)|(,,1111==+++ittjijtitjityxpaβαξ∑∝==jjitititTititxyp,:def)|(ξβαγ11)|(),()|()(defdef1111=====+itjtitttyypjiyxpiAB()()()()tttttttttttttβαγβαξββαα∗=∗∗=∗=∗=++++−. . .. . ABBABATT11111The matrix-vector form:Eric Xing 10Posterior decodingz We can now calculatez Then, we can askz What is the most likely state at position t of sequence x:z Note that this is an MPA of a single hidden state, what if we want to a MPA of a whole hidden state sequence?z Posterior Decoding: z This is different from MPA of a whole sequenceof hidden statesz This can be understood as bit error ratevs. word error rate)()(),()|(xxxxPPyPyPktktktktβα====11)|(maxarg*x1==ktktyPk{} : *TtytktL11 ==Example:MPA of X ?MPA of (X, Y) ?x y P(x,y)0 0 0.350 1 0.0510 0.311 0.36Eric Xing 11Viterbi decodingz GIVEN x =x1, …, xT, we want to find y = y1, …, yT, such that P(y|x) is maximized:y*= argmaxyP(y|x) = argmaxπP(y,x) z Let= Probability of most likely sequence of states ending at state yt= kz The recursion:z Underflows are a significant problemz These numbers become extremely small – underflowz Solution: Take the logs of all values:),,...,,,...,(max,--},...{-1111111==kttttyyktyxyyxxPVtitkiikttktVayxpV11−==,max)|(x1x2x3……………………...……..xNState 12Kx1x2x3……………………...……..xNState 12Kx1x2x3……………………...……..xNState 12Kx1x2x3……………………...……..xNState 12KVi(t)ktVttttxyxyyyyyyttbbaayyxxp,,,,),,,,,( LLKK11121111−=π()()itkiikttktVayxpV11−++==,logmax)|(logEric Xing 12The
View Full Document