DOC PREVIEW
BYU BIO 465 - Hidden Markov Models: Applications in Bioinformatics

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Hidden Markov Models: Applications in BioinformaticsDefinitionDefinition (Cont’d)Important CalculationsApplications of HMMHMM for Sequence AlignmentHMM for Sequence Alignment (Con’t)Slide 8Slide 9HMM for Sequence Alignments: ProcedureSlide 11ExampleExample (Cont’d)Example (End)HMM: Strengths & WeaknessesPowerPoint PresentationHidden Markov Models: Applications in BioinformaticsGleb Haynatzki, Ph.D.Creighton UniversityMarch 31, 2003Definition•A Hidden Markov Model (HMM) is a discrete-time finite-state Markov chain coupled with a sequence of letters emitted when the Markov chain visits its states. States (QQ): q1 q2 q3 ... Letters (OO): O1 O2 O3Definition (Cont’d)•The sequence OO of emitted letters is called “the observed sequence” because we often know it while not knowing the state sequence QQ, which is in this case called “hidden”. •The triple represents the full set of parameters of the HMM, where P is the transition probability matrix of the Markov chain, B is the emission probability matrix, and denotes the initial distribution vector of the Markov chain. λ= (P, B, π)πImportant CalculationsGiven any observed sequence O O = (O1,…,OT)•and , efficiently calculate P(OO | )•and , efficiently calculate the hidden sequence Q Q = (q1,…,qT) that is most likely to have occurred; i.e. find argmaxQQ P(QQ | OO)•and assuming a fixed graph structure of the underlying Markov chain, find the parameters = maximizing P(OO | )λλ(P, B,π)λλλApplications of HMM•Modeling protein families: (1) construct multiple sequence alignments(2) determine the family of a query sequence•Gene finding through semi-Hidden Markov Models (semiHMM)HMM for Sequence AlignmentConsider the following Markov chain underlying a HMM, with three types of states:   “match”;  “insert”;   “delete”(HMM for Sequence Alignment (Con’t)•The alphabet A consists of the 20 amino acids and a “delete” symbol ( )•Delete states output only with probability 1•Each insert & match state has its own distribution over the 20 amino acids and does not output δδδHMM for Sequence Alignment (Con’t)There are two extreme situations depending on the HMM parameters:•The emission probs for the match & insert states are uniform over the 20 amino acids the model produces random sequences•Each state emits one specific amino acid with prob 1 & mi  mi+1 with prob 1 the model produces the same sequence always⇒⇒HMM for Sequence Alignment (Con’t)Between the two extremes consider a “family” of somewhat similar sequences:•A “tight” family of very similar sequences•A “loose” family with little similaritySimilarity may be confined to certain areas of the sequences – if some match states emit a few amino acids, while other match states emit all amino acids uniformly/randomlyHMM for Sequence Alignments: Procedure(A) Start with “training”, or estimating, the parameters of the model using a set of training sequences from the protein family(B) Next, compute the path of states most likely to have produced each sequence(C) Amino acids are aligned if both are produced by the same match state in their paths(D) Finally, indels are inserted appropriately for insertions and deletionsλImportant CalculationsGiven any observed sequence O O = (O1,…,OT)•and , efficiently calculate P(OO | )•and , efficiently calculate the hidden sequence Q Q = (q1,…,qT) that is most likely to have occurred; i.e. find argmaxQQ P(QQ | OO)•and assuming a fixed graph structure of the underlying Markov chain, find the parameters = maximizing P(OO | )λλ(P, B,π)λλλExample•Consider: CAEFDDH, CDAEFPDDH•Suppose the model has length 10, and the most likely paths for the two sequences are:m0m1m2m3m4d5d6m7m8m9m10 andm0m1i1m2m3m4d5 m6m7m8m9m10Example (Cont’d)The alignment induced is found by aligning positions generated by the same match state:m0 m1 m2 m3 m4 d5 d6 m7m8m9m10 C A E F D D H C D A E F P D D H m0 m1 i1 m2 m3m4 d5 m6m7m8m9m10Example (End)This leads to the following alignment: C– AEF–DDH CDAEFPDDHHMM: Strengths & Weaknesses•HMM aligns many sequences with little computing power•HMM allows the sequences themselves to guide the alignment•Alignments by HMM are sometimes ambiguous and some regions are left unaligned in the end•HMM weaknesses come from their strengths: the Markov property and stationarityThank


View Full Document

BYU BIO 465 - Hidden Markov Models: Applications in Bioinformatics

Documents in this Course
summary

summary

13 pages

Cancer

Cancer

8 pages

Ch1

Ch1

5 pages

GNUMap

GNUMap

20 pages

cancer

cancer

8 pages

SNPs

SNPs

22 pages

Load more
Download Hidden Markov Models: Applications in Bioinformatics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hidden Markov Models: Applications in Bioinformatics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hidden Markov Models: Applications in Bioinformatics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?