DOC PREVIEW
Stanford CS 262 - Time Warping

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture Notes for Lecture 4Time WarpingHidden Markov ModelsLecture Notes for Lecture 4CS 262, Spring 2003, Scribed by Christopher SewellTime WarpingThe idea of aligning strings is similar to the process of time warping, in which two functions, (t) and (t), have analogous “shapes” but different “speeds”. In other words, one is compressed and the other expanded with respect to each other in the time domain. A compression of two units into one is analogous to a deletion, while an expansion of oneunit into two is analogous to an insertion. Thus the two trajectories may differ from each other by variations in speed as well as by normal additive random error. One common application of time warping is speech recognition, in which a recorded pattern must be matched to a reference pattern regardless of how fast or slow the recorded words were said.We would like to define an approximate continuous time warping (u0, v0), where u0(t) andv0(t) are strictly increasing functions over the domain [0, T] such that(u0(t))  (v0(t))Geometrically, each point each point in the one trajectory is “linked” to one point in the other trajectory. An initial approach to measuring the value of a time warping is to integrate a weight function (related to how far apart (u0(t)) and (v0(t)) are):Tdttvtuw000)))(()),(((However, this definition is not consistent, because, assuming an equivalent time warping (meaning it “links” the same points in the two trajectories) (u1(s), v1(s)) where s is some function of t, s = f(t),TSdttftvtuwdssvsuw000011)()))(()),((()))(()),(((which differs from the original definition by a factor of f’(t).A better objective function for time warping is:Tdttvtutvtuw000002)()()))(()),(((With this formula, any equivalent time warping (u1(s), v1(s)), where s is again some function of t, s = f(t), the definition is consistent: Letting g(s) be the inverse of f(t), g = f-1(t), t = g(s)Substituting for t, f(t) = f(g(s)) = sf’(t) = f’(g(s))g’(s) = 1, so g’(s) = 1/f’(t)u0(t) = u0(g(s)), so u0’(t) = u0’(g(s))g’(s)TTSdttvtutvtuwdttfsgtvtutvtuwdssvsusvsuw0000000000011112)()()))(()),((()()(2)()()))(()),(((2)()()))(()),(((If (t) and (t) are discretized into a = a0…aM and b = b0…bN, respectively, we define a discrete time warping (u[h], v[h]), which take as input a point along a common time scale[0..H] and returns a time point index on the original time scales, [0..M] for u and [0..N] for v. We want][][ hvhuba for all h = 1…Hsubject to the constraints that the full range of both trajectories are included:u[0] = v[0] = 0u[H] = Mv[H] = NIn order to maximize the discrete version of the objective function,Hhhvhuvubaw0][][2),(we can use dynamic programming. If (u, v), the possible differences in u and v between steps h-1 and h, are allowed only to be integer combinations totaling two,then (u + v)/2 = 1, and the objective function simplifies to Hhhvhubaw0][][),(Letting i = u[h] and j = v[h], the dynamic programming matrix D, with columns u = 0..M and rows v = 0..N, can be defined in terms of the recurrence:D(i, j) = ),(),2(),()1,1(),(),2(minjiwjiDjiwjiDjiwjiDwhere w(i, j) = w(ai, bj) = w(au[h], bv[h]), filling in from left to right, and down each column. Initialize the first two rows and columns as:D(i, 0) = iiiw021)0,( D(0, j) = jjjw021),0( D(1, j) = D(i, 1) = w(i, j) + w(i-1, j-1)This definition, which allows only steps of length two, introduces the constraint that only cells for which i+j is even are used. This reduces computation time by a factor of two, but it disregards some information from the original trajectories. This loss can be compensated for by increasing the sampling rate by a factor of 2.Hidden Markov ModelsA Hidden Markov Model consists of an alphabet,  = {b1, b2, …, bM}, and a set of states, Q = {1,…,K}. Each state contains a set of emission probabilities for events, given that the system is in that state, represented by the notation ei(b) = P(xi = b | i = k). The constraint holds that the sum of the emission probabilities for all events in any state is 1: ei(b1) + … + ei(bM) = 1 for all states i = 1..k. There are also probabilities for transitions between each pair of states, aij representing the probability of going from state i to state j, subject to the constraint that the sum of the transition probabilities out of any state is 1: ai1+ … + aik = 1 for all states i = 1..k. Finally, there are start probabilities, a0i, which are the probabilities that the system starts in each state; again the sum of all such probabilities is 1: a01 + … + a0k = 1. A Hidden Markov Model is memory-less, meaning that at each time step, only the current state affects the probability of reaching future states: P(i+1 = k | 0,1, …, i, x0, x1, …, xi) = P(i+1 = k | i).A model M consists of an architecture (the number of states, etc.) and the emission and transition probabilities,  = ei(.), aij. Thus, P(x | ) = P(x), and P(x,  | M) = P(x, ), when the architecture and the entire model are implied. Hidden Markov Models can be used for three purposes. Evaluation : How likely is this sequence of events, given our model of how the system works? In other words, given a HMM M and a sequence x, find P(x | M). Decoding : What portion of the sequence of events was generated in each state? In other words, given a HMM M and a sequence x, find the sequence of states that maximizes P(x,  | M).  Learning : What are the system parameter values (emission probabilities, start probabilities, transition probabilities)? In other words, given a HMM M with unspecified transition and emission probabilities and a sequence x, find parameters  = (ei(.), aij) that maximize P(x | ).As an example, a Hidden Markov Model can be used to model a dishonest casino, which switches back and forth every twenty rolls between a fair dice, with which the probabilityof rolling any integer from 1 to 6 is each 1/6, and a loaded dice, with which the probability of rolling a 6 is ½, and the probability of rolling any integer from 1 to 5 is each 1/10. The model can be used to answer the three


View Full Document

Stanford CS 262 - Time Warping

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Time Warping
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Time Warping and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Time Warping 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?