Stanford CS 262 - Lecture Notes - D488596

Home> Schools> Stanford University> Computer Science (CS) > CS 262> Lecture Notes

DOC PREVIEW

Stanford CS 262 - Lecture Notes

School name Stanford University

Course Cs 262- Computational Genomics

Pages 15

This preview shows page 1-2-3-4-5 out of 15 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS262 Winter 2007 Lecture 5, 01/23/2007Hidden Markov Models: Decoding and Evaluation Scribe: Chuan Sheng FooHidden Markov Models: Decoding and EvaluationWhat is a Hidden Markov Model?Hidden Markov Models (HMMs) are a probabilistic model for modeling and representing biological sequences. They allow us to do things like find genes, do sequence alignments and find regulatory elements such as promoters in a principled manner.The theory of HMMs will first be introduced, followed by the probabilistic interpretation of alignments using HMMs. Later in the course, we will see applications of HMMs to biological sequence modeling and discovery of features, of which the most notable application is that of gene finding. Gene finding was perhaps the application that resulted in the popularity of HMMs in bio-sequence analysis; such algorithms were first introduced around 1996-1997.To introduce the concept of a HMM, consider the following simple game played at a casino: you roll a die, and so does the casino player. The casino player wins if he rolls a six and loses otherwise. The catch is that the casino player has two dice – a fair die which rolls each of one through six with an equal probability of one sixth, and a loaded die that turns up a six half the time and turns up the other five numbers a tenth of the time. Furthermore, the casino player switches between the fair and loaded dice at random intervals, which do not depend on how long he has been rolling the current die, to reduce the chances of getting caught cheating.The Three Problems of HMMsWith this scenario in mind, we might be interested in the answers to the following questions:1. The Evaluation ProblemFirstly, if we are given a sequence of rolls by the casino player, for example, as shown below where the sixes have been highlighted in red, how likely is the sequence given our understanding of how the casino works? In other words, assuming that we have modeled the casino player’s actions correctly (when he switches dice), what is the probability that the given sequence was generated by his die throws? This is known as the evaluation problem of HMMs.2. The Decoding ProblemThe second question: given the sequence of rolls by the casino player, is it possible to catch him cheating? More specifically, is it possible for us to tell which positions were rolled by the loaded die and fair die respectively? This is known as the decoding problem of HMMs, in which we are trying to find the underlying sequence of states that led to the given sequence. For example, inCS262 Winter 2007 Lecture 5, 01/23/2007Hidden Markov Models: Decoding and Evaluation Scribe: Chuan Sheng Foothe given sequence below, we might decode the states as shown – the rolls in the blue regions being rolled by the fair die and those in the orange region being rolled by the loaded die.3. The Learning ProblemFinally, given a sequence of rolls, and assuming that we know that the casino player switches between a fair and loaded die, how loaded is the loaded die – what is the probability that it rolls a six? Similarly, how fair is the fair die? Does it roll all six numbers with an equal probability? How often does the casino player change between the fair and loaded dice? Determining all these parameters of the model is what we term as the learning problem of HMMs. For instance, in the following example, assuming that fair-loaded-fair is the correct state sequence, then there is a 64% chance of the loaded die rolling a six. Although this may just be an especially lucky sequence of rolls by the loaded die, and the probability of rolling a six is actually half, without any other information, our best estimate for the probability that it rolls a six is 64%.In the next few lectures, a principled approach to solving each of the three problems will be presented, as well as efficient algorithms that are used to solve the problems.An Example HMM - The Dishonest Casino ModelCS262 Winter 2007 Lecture 5, 01/23/2007Hidden Markov Models: Decoding and Evaluation Scribe: Chuan Sheng FooThe diagram shown above is a representation of the dishonest casino player model as discussed before. As seen from the diagram, there are two states – fair and loaded, which represent which die the casino player is using. The arrows represent the decisions whether to stay in the same state or to switch to the other state, while the numbers beside the respective arrows represent the probabilities that the decision will occur. In this example, the probability of staying in the same state is set to 0.95, but it may be set to some other probability to model a different process.Furthermore, once we have determined whether or not to switch states, we may emit one of the letters of the alphabet with a given emission probability. In this case, the alphabet corresponds to the possible values of the dice roll and the probabilities reflect whether the die is fair or loaded.What is Hidden in HMMs?Why is this model called a Hidden Markov Model? What is hidden from us in this model? The answer is that what is hidden are the states – when we observe the sequence of rolls, we are unable to tell whether or not the casino player is rolling a fair or loaded die, which corresponds to the state of the model; this information about which die the player is rolling affects the outcome, but we are unable to observe it.The fact that the states are hidden from us is really the crux of HMMs. For example, when we are given a biological sequence, interesting features such as genes and splice sites are hidden from us, and these features are what we wish to find. In the following few lectures, we shall be learning principled methods for finding good sequences of states: although we cannot really catch the casino player with a smoking gun, pinpointing the rolls in which he was cheating, we will be able to determine with high probability whether or not he was using the loaded die;finding good sequences of states is the goal of studying HMMs.The Markov PropertyNow that we have understood what is hidden in HMMs, we shall proceed to understand what is the Markov property of HMMs – that a HMM is memory-less. This means that at each time step t, the only thing that affects future states, and future letters that will be emitted from the model is the current state t.CS262 Winter 2007 Lecture 5, 01/23/2007Hidden Markov Models: Decoding and Evaluation Scribe: Chuan Sheng FooFormal Definition of a HMMThe definition of a HMM is similar

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 15 pages.

Stanford CS 262 - Lecture Notes

Sign up for free to view:

Please select your school