DOC PREVIEW
Stanford CS 262 - Lecture 9 Notes

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1 Scribed by: Huey T. Vo CS262 Lecture 9 Notes Lecture date: 02/06/2007 Conditional Random Fields (cont.) It is a supped-up version of HMM, allowing us to have a richer model for biological sequences. The state πi can inspect the entire sequence x and the previous state πi-1 knowing that the current position is i. Feature Function A score of a parse at a position i+1 could be defined in this form: Vl(i+1) = maxk[Vk(i) + g(k, l, x, i+1)] where g(k, l, x, i) = Σj=1…n fi(k, l, x, i) x wj The feature function g encapsulates the picture above: previous state k, current state l, entire sequence x, and knowing current position i+1. f is called an indicator variable (having true or false value), and wj is the weight (parameter) given for the variable. In the original HMM, the function g depends on only the two states k and l, and the two read parameters transition and emission probability. So for the original HMM, there are only 2 indicator variables: transmission (did we transit from state k to l or not) and emission (do we emit from state l or not), and their weight parameter is the transmission and emission probability, respectively. But we could have more parameters; for instance, we might be interested whether H letters is a majority outcome at a state i given that we transitioned to state i from some previous state. Feature Dependency In CRF, a state depends on its neighboring states and not just its previous state. Furthermore, a state can inspect the entire sequence x. Viterbi can still be used to find the optimal parse in CRF because we can pre-compute g for all positions since the sequence x is constant. Unlike HMM which can only have k2 transition probabilities and (k x size of alphabet) emission probabilities, CRF can have an arbitrary numbers of parameters. This allows greater flexibility in the model, but how do we train them? πi πi-1 x1 x2 x3 x4 x5 x10 x6 x7 x8 x92 Conditional Training CRF is not a generative model. It can’t because at state π1 it needs to see the rest of the sequence which has not been generated yet. Basically, P(x) is not well defined in CRF. P(x, π) = P(π|x) P(x) In HMM training, given a training sequence x and “true” parse π, our goal is to maximize P(x, π). For P(π|x), we are trying to pick parameters that would give us a good parse π given sequence x, and for P(x), we want parameters that make our model fit the training sequences and generate good sequences. There is an obvious trade-off between picking parameters that are best for both purposes. Since P(x) is not well defined in CRF, this trade-off does not exist in CRF. In other words, in CRF training we are only concerned with P(π|x). Recall that F(j, x, π) = # of times that feature fj occurs in (x, π) = ΣΣΣΣi=1…N fj(πi-1, πI, x, i) In HMM, we can denote wj (the weight of jth feature) as wj = log(akl) or log(ek(b)), then we have: P(x, π) = exp[ΣΣΣΣi=1…n wj x F(j, x, π)] Similarly we would have the score (not probability) for CRF model: Score(x, π) = exp[ΣΣΣΣi=1…n wj x F(j, x, π)] Recall that to obtain P(x), we must add the probabilities of all possible paths that can generate the sequence x; hence we now have: P(x) = ΣΣΣΣπ P(x, π) = ΣΣΣΣπ exp[ΣΣΣΣi=1…n wj x F(j, x, π)] =: Z HMM π1 x1 π2 x2 π3 x3 π4 x4 π5 x5 π6 x6 …CRF π1 x1 π2 x2 π3 x3 π4 x4 π5 x5 π6 x6 …3 So now we can normalize the Score(x, π) into a probability (the sum of all PCRF is 1): PCRF(π|x) = exp[ΣΣΣΣi=1…n wj x F(j, x, π)] / Z Training Algorithm The CRF training algorithm can now be summarized as follows: 1. Given a training set of sequences x and “true” parse π 2. Computer Z by sum-of-paths algorithm similar to HMM. Then computer P(x, π) 3. Compute partial derivative of P(π|x) with regard to each parameter wj. d/dwj P(π|x) = F(i, x, π) – E(I, x, π) 4. Continue until finding optimal set of weights In summary, CRF: - Provides the ability to incorporate many non-local features into our model. It removes the independence assumptions that we have in HMMs. - Trains parameters that are best for parsing, not modeling (we can’t do P(x)). However, CRF training is slower as it has to do more iterations. DNA Sequencing Concept DNA, the basic generic material of life, contains information necessary for an organism to develop, live, reproduce and pass its generic information to the next generation (figure taken from http://www.genomenewsnetwork.org/). Human genome is made up of over 3 billion generic codes divided into 46 chromosomes. DNA size does not correspond to trivial hierarchical structures such as food chain. There are other organisms with many more generic codes than humans but are much lower on the food chain. DNA sequencing addresses the issue of how we figure out the order of DNA nucleotides (bases, letters) in a genome. Given a piece of DNA, we need to translate it into a sequence of letters in the alphabet of A, G, T and C. An obvious question: genome of which human should we use for sequencing? The answer entails an interesting story: the race between public and private sector to achieve the goal of deciphering all of human DNA. In the early 1990’s, an international effort was started to sequence human genome. It was a publicly funded project coordinated by the US Department of Energy and National Institutes of Health, and involved many scientists around the world.4 The goal of the project was to complete sequencing human genome by 2005 with a $3 billion budget. Some analysts were less optimistic and predicted that it would not be done until 2015. The effort was slow initially. Around 1997-98, Gene Myers, a computer scientist, proposed a new sequencing methodology which would possibly speed up the sequencing process. Many did not believe in this new technique, except one Dr. Craig Venter who believed Myers and formed Celera Genomics, a company dedicated to generating and commercializing genomic information. Using Myers technique, Celera claimed that it could finish the human genome sequencing by 2000 at a fraction of the cost ($300 million vs. $3 billion budget of the public project). The race had begun. In the end, Celera “succeeded” to some extents: it could not produce good assembly of human genome without using the public data. The race brought some positive effects to the public project however:


View Full Document

Stanford CS 262 - Lecture 9 Notes

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Lecture 9 Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 9 Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 9 Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?