1CS559: Machine LearningCS 559: Machine Learning Fundamentals and Applications6thSet of NotesInstructor: Philippos MordohaiWebpage: www.cs.stevens.edu/~mordohaiEmail:Philippos Mordohai@stevens eduE-mail: [email protected]: Lieb 215Project ProposalProject Proposal•Project titleProject title• Data set•Project idea: What is the objective whatProject idea: What is the objective, what method(s) will be tested?• Relevant papers pp– Optional, but a good strategy• Software you plan to writeyp• Experiments you plan to do• Has to be approved by me before October 19pp y2OverviewOverview• Note on Overfitting(Week 5 notes)g()• DHS Ch. 4 (Sec. 4.1-4.5) (Week 5 notes)– Non-parametric Classification• Density estimation• Parzen windows• K–Nearest Neighbor estimation• Hidden Markov Models (DHS Sec. 3.10-3.10.5)• Principal Component Analysis (DHS Sec. 3.8.1 –essentiall notes onl )essentially notes only)– Intro–Singular Value DecompositionSingular Value Decomposition3Hidden Markov ModelsHidden Markov ModelsPattern Classification, Chapter 3 4Markov ChainsMarkov Chains• Goal: make a sequence of decisions– Processes that unfold in time, states at time t are influenced by a state at time t-1– Applications: speech recognition, gesture recognition, parts of speech tagging and DNA sequencing, –Any temporal process without memoryT= {(1), (2), (3), …, (T)} sequence of statesWe might have 6= {1, 4, 2, 2, 1, 4} 142214– The system can revisit a state at different steps and not every state needs to be visitedPattern Classification, Chapter 3 5First-order Markov ModelsFirstorder Markov Models•The production of any sequence isThe production of any sequence is described by the transition probabilities P(j(t + 1) | i(t)) = aij• Notes– aijdoes not have to be symmetric–aiiis not 0 in generaliigPattern Classification, Chapter 3 6Pattern Classification, Chapter 3 7 = (aij, T)P(T| ) = a14a42a22 a21a14Example: speech recognition“production of spoken words”Production of the word: “pattern” represented bhby phonemes/p/ /a/ /tt/ /er/ /n/ // ( // = silent state)T iti f //t ////t /tt//tt/t/Transitions from /p/ to /a/, /a/ to /tt/, /tt/ to er/, /er/ to /n/ and /n/ to a silent statePattern Classification, Chapter 3 8Speech AnalysisPattern Classification, Chapter 3 9Example:3CoinsExample: 3 CoinsAssume there are 3 coins: •onebiased towards Headsone biased towards Heads• one biased towards Tails•onenon-biasedone nonbiasedThe person behind the curtain tosses one coinThe person behind the curtain tosses one coin repeatedly, then switches to another and tosses that ... 10Example:3CoinsExample: 3 Coins• Assume you see these observations:• HTHTHTHHTHTTHTTTTHTTTHTTTTTHHHHTHHTHHHHWhat would you think is the most likely explanation as to which coins he is using?HTHTHTHHTHTTHTTTTHTTTHTTTTTHHHHTHHTHHHH11DefinitionDefinitionDbl thtiith d l ithtiDoubly stochastic process with an underlying stochasticprocess that is not observable (hidden), but can only be observed through another set of stochastic processes that dth f b d b lproduce the sequence of observed symbols.• The observations are the outcomes of the tosses• The biased coins are the hidden states12Markov Chains1storder Markov chain2ndorder Markov chain1storder with stochastic observations -- HMM13Hidden Markov Model (HMM)Hidden Markov Model (HMM)– Interaction of the visible states with the hidden statesTransmission probabilities for emission of visible states:–Transmission probabilities for emission of visible states:(a.k.a. emission probabilities)b=P(V(t) |(t))bjk=P(Vk(t) | j(t))bjk= 1 for all j–3 problems are associated with this model• The evaluation problem• The decoding problem• The learning problemPattern Classification, Chapter 3 14The Evaluation ProblemThe Evaluation ProblemThe probability that the model produces a sequence VT of visiblestates is:sequence Vof visible states is:)()|()(maxTrTTTPVPVPwhere each r indexes a particular sequenceof T hidden states)()|()(1rrrPVPVP)()2()1(TTof T hidden states. )(),...,2(),1(TTrTtTrTttvPVP ))(|)(()|( )1(TttrttPTr1))1(|)(()P( (2)Pattern Classification, Chapter 3 15t 1Using equations (1) and (2), we can write:Interpretation:The probability that we observe the particular))1(|)(( ))(|)(()(max11rrTttTttPttvPVPInterpretation:The probability that we observe the particular sequence of T visible states VTis equal to the sum over all rmaxpossible sequences of hidden states of the conditional probability that the system has made a particular transition multiplied by the probability that it then emitted the visible symbol in our targetprobability that it then emitted the visible symbol in our target sequence.Example: Let 1, 2, 3be the hidden states; v1, v2, v3be the visible statesstates and V3 = {v1, v2, v3} is the sequence of visible statesP({v1, v2, v3}) = P(1)P(v1| 1)P(2| 1)P(v2| 2)P(3| 2)P(v3| 3)12311121223233+…+ (possible terms in the sum= all possible (33= 27) cases !)In general rmax=cT, where c is the number of statesPattern Classification, Chapter 3 16v1v2v3First possibility:1(t =1)2(t =2)3(t = 3)Second Possibility:(t = 1)(t = 2)(t = 3)v3v2v12(t = 1)1(t = 3)3(t = 2)P({v1, v2, v3}) = P(2)P(v1| 2)P(3| 2)P(v2| 3)P(1| 3)P(v3| 1) + …+Therefore:Therefore:sequence31321))1(|)(())(|)((}),,({possiblettttPttvPvvvPstateshidden ofsequence1possibletPattern Classification, Chapter 3 17The Forward Algorithm• Direct computation prohibitively expensive•Feasible recursive alternativeFeasible recursive alternative0 t=0 and j not initial state(t)1t0dji i iti l t taj(t)= 1 t=0 and j is initial state[Σiai(t-1)aij]bjkv(t) otherwisejjThe only nonzero contribution attisThe only nonzero contribution at tis transmission probability corresp. to visible statestatePattern Classification, Chapter 3 18Note: Typo in caption of Fig. 3.10 in DHS. See errata.Algorithm1. Initialize: t0, aij, bjk, visible sequence VT, aj(0)aj(0)2. for tt+13. aj(t)bjkv(t)Σ ai(t-1) aijj()jk()i()ij4. until t=T5. return P(VT)a0(T)aj(t): probability of being in state ωjat step t, ha ing generated first t elements of VThaving generated first t elements of VTa0(T)
View Full Document