DOC PREVIEW
Stanford CS 262 - Lecture Notes

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS262: Computational genomicsCS 262 – Problem Set 4(due at the beginning of class on Thursday, Mar 11)Collaboration is allowed, but you must submit separate writeups. Please alsowrite the names of all your collaborators on your submissions.1. Motif Finding(a) Recall the CONSENSUS algorithm (review page 572 of the Hertz and Stormopaper). Show an example with N sequences X1, . . . ,XN where the number ofsaved alignments in each cycle is 5, and a 5-long motif is sought. Make yourexample such that the best 5-long motif is NOT found, but would be found if6 alignments were saved each turn.(b) Recall the example shown in class on the importance of initial conditions forEM. The sequences X1, . . . ,XN contain:• all words of length k consisting only of A and T (2k such words)• all words of length k consisting only of C and G (2k such words)• D words ACGT…..ACGT (assume k is a multiple of 4)Consider two different sets of local maximum parameters:i. Calculate the log likelihood P(X1, . . . ,XN, Z|θ, λ) where Z are the optimalassignments of each of the 2k+1 + D words, in each of the two categories(motif/background), for the above parameters if k = 8, D = 1. Which localoptimum is best?ii. Repeat the previous part for k = 8, D = 2k+1.iii. Still assuming k = 8, what happens for 1 < D < 2k+1? (Optional: providenice plots for each parameter set)2. Stochastic Context Free Grammars(a) In this problem you will be asked to model an imaginary hairpin loop with aContext Free Grammar. The Imaginary Hairpin Loop (IHL) is as follows:1CS262: Computational genomicsDescribe a CFG that produces all such loops, where pairs can be A−U, C −G,or G − U. Within the loop, the two positions next to the last pair cannot beA − U, C − G, or G − U. The bulge can be anything.(b) Now modify this to get a SCFG. Make your grammar such that the followingconditions are satisfied:i. A structure with a G − U pair is 1/2 as likely as the same structure withan A−U or G−C pair instead of the G−U. A−U and G−C structuresare equally likely.ii. A structure with a length 1 bulge (in the special bulge position) is equallylikely to one with a length 0 bulge.iii. A structure with a length 4 loop and one with a length 5 loop are equallylikely.(c) Convert your SCFG to Chomsky Normal Form, and adjust the probabilities tokeep the same defined likelihoods for all structures.(d) Given an n-long sequence x = x1. . .xn, what is the time required to obtain theoptimal local alignment of the above SCFG to x? Explain what algorithm youwould use, and give a detailed time analysis.3. RNA FoldingLet S be a string of n characters over the RNA alphabet A,C,G,U. In RNA secondary structure formation, RNA folds in configurations that favor pairings A − U, C −G, and G−U. Every such pairing decreases the energy of the fold, and the RNA folds in a way that minimizes the total energy. A loop in a nested pairing of S = s1 . . . sn is a subsequence si+1 . . . sj−1 suchthat no pairing exists between si+1 . . . sj−1 and there is a pairing (si, sj). Theenergy function we discussed so far does not model loops, which may contributeto the free energy of the structure.A loop function L(S) can model the loop energies. Modify the Nussinov algorithm, andanalyze the new running time, if:i. L(S) is an arbitrary function S  R, where S is the set of all strings onA,C,G, U.ii. L(s1 . . . si) = c0 + L0(s1) + . . .+ L0(si), where c0 is a constant, and L0(sj) areindependent from one another (separate scores for A,C,G,U).2CS262: Computational genomicsiii. A pseudoknot is a set of 10 consecutive crossing base pairs. Modify the Nussinov algorithm to allow for the presence of at most 1 pseudoknot, where the cost of the pseudoknot, if it exists, is S. 4. Evolutionary trees (Will be discussed on Tuesday, March 9)i. Give a simple algorithm that can determine in O(n2) time if an n by n matrix is an ultrametric matrix.ii. Give a simple linear-time algorithm to determine whether two phylogenies are isomorphic. The fact that the leaves have distinct labels is


View Full Document

Stanford CS 262 - Lecture Notes

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?