STEVENS CS 559 - Fundamentals and Applications 6th Set of Notes

Unformatted text preview:

1CS559: Machine LearningCS 559: Machine Learning Fundamentals and Applications6thSet of NotesInstructor: Philippos MordohaiWebpage: www.cs.stevens.edu/~mordohaiEmail:Philippos Mordohai@stevens eduE-mail: [email protected]: Lieb 215Project ProposalProject Proposal•Project titleProject title• Data set•Project idea: What is the objective whatProject idea: What is the objective, what method(s) will be tested?• Relevant papers pp– Optional, but a good strategy• Software you plan to writeyp• Experiments you plan to do• Has to be approved by me before October 19pp y2OverviewOverview• Note on Overfitting(Week 5 notes)g()• DHS Ch. 4 (Sec. 4.1-4.5) (Week 5 notes)– Non-parametric Classification• Density estimation• Parzen windows• K–Nearest Neighbor estimation• Hidden Markov Models (DHS Sec. 3.10-3.10.5)• Principal Component Analysis (DHS Sec. 3.8.1 –essentiall notes onl )essentially notes only)– Intro–Singular Value DecompositionSingular Value Decomposition3Hidden Markov ModelsHidden Markov ModelsPattern Classification, Chapter 3 4Markov ChainsMarkov Chains• Goal: make a sequence of decisions– Processes that unfold in time, states at time t are influenced by a state at time t-1– Applications: speech recognition, gesture recognition, parts of speech tagging and DNA sequencing, –Any temporal process without memoryT= {(1), (2), (3), …, (T)} sequence of statesWe might have 6= {1, 4, 2, 2, 1, 4} 142214– The system can revisit a state at different steps and not every state needs to be visitedPattern Classification, Chapter 3 5First-order Markov ModelsFirstorder Markov Models•The production of any sequence isThe production of any sequence is described by the transition probabilities P(j(t + 1) | i(t)) = aij• Notes– aijdoes not have to be symmetric–aiiis not 0 in generaliigPattern Classification, Chapter 3 6Pattern Classification, Chapter 3 7 = (aij, T)P(T| ) = a14a42a22 a21a14Example: speech recognition“production of spoken words”Production of the word: “pattern” represented bhby phonemes/p/ /a/ /tt/ /er/ /n/ // ( // = silent state)T iti f //t ////t /tt//tt/t/Transitions from /p/ to /a/, /a/ to /tt/, /tt/ to er/, /er/ to /n/ and /n/ to a silent statePattern Classification, Chapter 3 8Speech AnalysisPattern Classification, Chapter 3 9Example:3CoinsExample: 3 CoinsAssume there are 3 coins: •onebiased towards Headsone biased towards Heads• one biased towards Tails•onenon-biasedone nonbiasedThe person behind the curtain tosses one coinThe person behind the curtain tosses one coin repeatedly, then switches to another and tosses that ... 10Example:3CoinsExample: 3 Coins• Assume you see these observations:• HTHTHTHHTHTTHTTTTHTTTHTTTTTHHHHTHHTHHHHWhat would you think is the most likely explanation as to which coins he is using?HTHTHTHHTHTTHTTTTHTTTHTTTTTHHHHTHHTHHHH11DefinitionDefinitionDbl thtiith d l ithtiDoubly stochastic process with an underlying stochasticprocess that is not observable (hidden), but can only be observed through another set of stochastic processes that dth f b d b lproduce the sequence of observed symbols.• The observations are the outcomes of the tosses• The biased coins are the hidden states12Markov Chains1storder Markov chain2ndorder Markov chain1storder with stochastic observations -- HMM13Hidden Markov Model (HMM)Hidden Markov Model (HMM)– Interaction of the visible states with the hidden statesTransmission probabilities for emission of visible states:–Transmission probabilities for emission of visible states:(a.k.a. emission probabilities)b=P(V(t) |(t))bjk=P(Vk(t) | j(t))bjk= 1 for all j–3 problems are associated with this model• The evaluation problem• The decoding problem• The learning problemPattern Classification, Chapter 3 14The Evaluation ProblemThe Evaluation ProblemThe probability that the model produces a sequence VT of visiblestates is:sequence Vof visible states is:)()|()(maxTrTTTPVPVPwhere each r indexes a particular sequenceof T hidden states)()|()(1rrrPVPVP)()2()1(TTof T hidden states. )(),...,2(),1(TTrTtTrTttvPVP ))(|)(()|( )1(TttrttPTr1))1(|)(()P( (2)Pattern Classification, Chapter 3 15t 1Using equations (1) and (2), we can write:Interpretation:The probability that we observe the particular))1(|)(( ))(|)(()(max11rrTttTttPttvPVPInterpretation:The probability that we observe the particular sequence of T visible states VTis equal to the sum over all rmaxpossible sequences of hidden states of the conditional probability that the system has made a particular transition multiplied by the probability that it then emitted the visible symbol in our targetprobability that it then emitted the visible symbol in our target sequence.Example: Let 1, 2, 3be the hidden states; v1, v2, v3be the visible statesstates and V3 = {v1, v2, v3} is the sequence of visible statesP({v1, v2, v3}) = P(1)P(v1| 1)P(2| 1)P(v2| 2)P(3| 2)P(v3| 3)12311121223233+…+ (possible terms in the sum= all possible (33= 27) cases !)In general rmax=cT, where c is the number of statesPattern Classification, Chapter 3 16v1v2v3First possibility:1(t =1)2(t =2)3(t = 3)Second Possibility:(t = 1)(t = 2)(t = 3)v3v2v12(t = 1)1(t = 3)3(t = 2)P({v1, v2, v3}) = P(2)P(v1| 2)P(3| 2)P(v2| 3)P(1| 3)P(v3| 1) + …+Therefore:Therefore:sequence31321))1(|)(())(|)((}),,({possiblettttPttvPvvvPstateshidden ofsequence1possibletPattern Classification, Chapter 3 17The Forward Algorithm• Direct computation prohibitively expensive•Feasible recursive alternativeFeasible recursive alternative0 t=0 and j not initial state(t)1t0dji i iti l t taj(t)= 1 t=0 and j is initial state[Σiai(t-1)aij]bjkv(t) otherwisejjThe only nonzero contribution attisThe only nonzero contribution at tis transmission probability corresp. to visible statestatePattern Classification, Chapter 3 18Note: Typo in caption of Fig. 3.10 in DHS. See errata.Algorithm1. Initialize: t0, aij, bjk, visible sequence VT, aj(0)aj(0)2. for tt+13. aj(t)bjkv(t)Σ ai(t-1) aijj()jk()i()ij4. until t=T5. return P(VT)a0(T)aj(t): probability of being in state ωjat step t, ha ing generated first t elements of VThaving generated first t elements of VTa0(T)


View Full Document

STEVENS CS 559 - Fundamentals and Applications 6th Set of Notes

Download Fundamentals and Applications 6th Set of Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Fundamentals and Applications 6th Set of Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Fundamentals and Applications 6th Set of Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?