DOC PREVIEW
CMU BSC 03711 - Problem

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Fall 2009 Computational Genomics and Molecular Biology 1Problem Set 3 Due Thursday, November 19thCollaboration is allowed on this homework. You must hand in homeworks individually and listthe names of the people you worked with.1. Suppose you are using the Gibbs sampler to find a conserved pattern of length five in thefollowing sequences:T T A A A C A GG T T A A G C A A C A C A T AC A C C G A G G C A T G A TC T T T G G G C A G A T A T A C AG C T T G G C A G T C A C C CIn the current iteration of the Gibbs sampler, t∗= t1. Construct the PSSM for the patternthat starts at positions i2= 4, i3= 6, i4= 5, and i5= 4, using the four following steps:(a) Construct the frequency matrix using the pseudocount b = 1 that we used in class.(b) Calculate the propensity matrix, P [i, j], for the pattern, using the background frequen-cies pi= 0.25, i ∈ {A, C, G, T }.Fall 2009 Computational Genomics and Molecular Biology 2(c) Calculate the log odds scoring matrix, S[i, j], for the pattern.(d) Use S[i, j] to score all valid windows in the sequence, t∗. What is the highest scoringfive-mer in t∗?Fall 2009 Computational Genomics and Molecular Biology 32. Define an HMM H with three states {A, B, C} and alphabet {0, 1, 2} and the following tran-sition and emission probabilities:A B C 0 1 2A 0.2 0.8 0.0 0.8 0.2 0.0B 0.0 0.8 0.2 0.0 0.6 0.4C 0.4 0.0 0.6 0.2 0.0 0.8(a) Draw the state diagram of this HMM and show the transition probabilities.(b) Assuming that initial state is A, give all of the possible state paths for the sequenceO = 0, 1, 2, 0.Fall 2009 Computational Genomics and Molecular Biology 4(c) What is P (O)?(d) What is the most probable path, Q∗? What is P (O|Q∗), the probability of O for thispath?(e) The Forward algorithm calculates P (O), the probability that the model will emit a givensequence, O, over all possible paths. One might consider approximating this probabilityby calculating P (O|Q∗) using the Viterbi algorithm. For this particular HMM, wouldP (O|Q∗) be a good approximation P (O)? Explain your reasoning.Fall 2009 Computational Genomics and Molecular Biology 53. There are three stop codons: TAG, TAA, and TGA. In the extremophile bacterium Deinococ-cus radiodurans, the relative freqency of these stop codons is 5%, 12%, and 83%. Constructan HMM that emits these stop codons, and only these codons, using the smallest number ofstates possible. (Fewer than nine states are required.).Give the topology of model and the initial, transition, and emission probabilities. Your modelshould emit the three alternate stop codons in the correct frequencies. There is more thanone solution; just give one.Fall 2009 Computational Genomics and Molecular Biology 64. Consider the following HMM that emits sequences, drawn from a letter alphabet, Σ = {H, L},that participate in coiled coil structures. All paths begin in the initial state, Si, and end inthe termination state, St. Note that these are not silent states. The transition and emissionprobabilities for this HMM are:Transition probabilities Emission probabilitiesSiA B C D E F G StH LSi0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5A 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.9 0.1B 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.2 0.8C 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.2 0.8D 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.9 0.1E 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.2 0.8F 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.2 0.8G 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.2 0.8St0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.5 0.5Coiled coils are protein structural motifs formed by intertwined alpha helices. Sequences thatparticipate in coiled coils have a characteristic seven-fold repeat. The seven positions in thispatter are typically denoted by the letters A-G. The residues in the A and D positions are atthe interface of the two helices and, hence, are hydrophobic. The residues in the B, C, E, Fand G positions are exposed and typically hydrophilic.(a) Draw the state diagram for this HMM. Label the transitions with their probabilities.(b) Given an input sequence of length 5, how many corresponding state sequences withnonzero probability are there for this HMM? What are they?Fall 2009 Computational Genomics and Molecular Biology 7(c) Given an input sequence of length 10, how many corresponding state sequences withnonzero probability are there for this HMM? What are they?(d) Can this HMM emit a sequence that does not contain a coiled coil ? Explain your answer.If not, how would you modify the HMM to emit such a sequence?(e) Coiled coils sequences typically have two or more consecutive copies of the seven-foldrepeat. How does the probability of coiled coil sequences generated by this HMM varywith copy number? Are sequences with two copies of the seve n-fold repeat more likelythan sequences with just one copy?(f) Coiled coils sometimes contain an “offset,” i.e., insertions of a few amino ac ids betweencopies of the seven-fold repeat. Can this HMM emit coiled coil sequences with an offset?Explain your answer. If not, how would you modify the HMM to emit sequences withan offset between the copies?Fall 2009 Computational Genomics and Molecular Biology 85. A palindrome is a string that reads the same forward and backward, such as “A man, aplan, a canal, Panama” (ignoring capitalization and punctuation). In molecular biology, apalindrome is a double stranded DNA sequence in which the sequence of one strand in the5’ to 3’ direction is the same as the sequence of the opposite strand in its 5’ to 3’ direction.Many restriction enzyme recognition sites are palindromes, such as the ecoR1 binding siteGAATTCCTTAAG(a) Suppose you wanted to model palindromic sequences of arbitrary length. Why would aPSSM be a poor formalism for modeling palindromic sequences?(b) Discuss the pros and cons of using an HMM to model palindromic sequences of anylength. In what way would it be an improvement over a PSSM? What obstacles wouldnot be resolved by the HMM formalism? You may find it useful to draw a figure inanswering your question.6. All coiled coils have the length 7 repeated pattern, but the pattern is somewhat different incoiled coils that participate in 4-helix bundles and those that participate in coiled coil pairs.Suppose you have two separate HMMs, one that models coiled coil pair s and one that models4-helix bundles. You wish to determine if a novel coiled coil sequence participates in a pairor a 4-helix bundle. How would you do this? What quantities would you calculate and whatalgorithms


View Full Document

CMU BSC 03711 - Problem

Documents in this Course
lecture

lecture

8 pages

Lecture

Lecture

3 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

7 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

21 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

lecture

lecture

11 pages

Lecture

Lecture

8 pages

Lecture

Lecture

9 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Homework

Homework

10 pages

Lecture

Lecture

9 pages

Problem

Problem

7 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

6 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Notes

Notes

7 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Homework

Homework

15 pages

Lecture

Lecture

16 pages

Problem

Problem

15 pages

Load more
Download Problem
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Problem and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Problem 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?