03-511/711 Computational Genomics and Molecular Biology, Fall 2009 1Study questionsThese study problems are intended to help you to review for the final exam. This is by no meanan exhaustive list of the topics covered in the class and there is no guarantee that these questionsare representative of the questions on the final exam. You should also re view your class notes, thesyllabus and your home work assignments.1. Pairwise sequence alignment(a) Compare, dynamic programming for local, global and semiglobal alignment in termsof when they should be used; differences in initialization and termination condition;recursion relation, which scoring functions are suitable.2. Multiple sequence alignment(a) What are the differences between the exact dynamic programming for multiple sequencealignment and the progressive alignment heuristic?(b) Score the following multiple sequence alignment using sum-of-pairs and tree alignment.Compare the scores and explain why they differ?AGGAT03-511/711 Computational Genomics and Molecular Biology, Fall 2009 23. For each of the phylogeny reconstruction methods listed below,(a) which typ e s of information can they infer?(b) If a method cannot infer all of the types listed, what biological or algorithmic assumptionsimpose these limitations?(c) Which reconstruction methods are appropriate for each of the types of data listed above?Why?Phylogeny reconstruction methods• UPGMA• Neighbor Joining• minimum evolution• maximum parsimony• maximum likelihoodInformation inferred by such methods• unrooted tree topology• rooted tree topology• branch lengths,• labels on internal nodesSequences• nucleotide• amino acid• evolving rapidly• under selective pressure• evolving at a constant rate in all lineages03-511/711 Computational Genomics and Molecular Biology, Fall 2009 34. Review Fitch’s algorithm.(a) What does it do?(b) How do es it work?(c) What can’t it do?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 45. Tree reconstructionBelow you see trees constructed using 5S small subunit ribosomal RNA genes from fourbacterial spec ies:M10815 Bacillus subtilis (BS)X02713 Lactobacillus viridescene (LV)M58416 Clostridium tyrobutyicum (CT)K02683 Micrococcus luteus (ML)UPGMA Tree:-------------------| || || || 13.14| || || |28.74 ---- ----| | || | 7.45| | || | -- --| 15.6 | || | 8.15 8.15| | | |ML LV BS CT03-511/711 Computational Genomics and Molecular Biology, Fall 2009 5Neighbor-Joining tree:CT//BS /10.872.08\ _ 1.86_ _// \/ \ 43.75/ \24.55 / \/ \/ \/ \/ \LV \\\\\\\\MT(a) Are the topologies the same? If not, how do they differ?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 6(b) In which tree , is the distance from Lactobacillis viridescens to Micrococcus luteus mostdistorted with respect to the original distance matrix derived from the multiple sequencealignment?DISTANCE MATRIX FROM OUTPUTKey for column and row indices:1 BSubtilis2 CTyro3 LViridescens4 MLuteusMatrix 1: Part 11 2 3 4--------------------------------------------------| 1 | 0.00 16.31 26.63 46.18| 2 | 0.00 35.77 54.62| 3 | 0.00 71.66| 4 | 0.00Distance derived from the data: 71.66Distance derived from the NJ tree: 43.75 + 1.86 + 24.55 = 70.16Distance derived from the UPGMA: 15.60 + 13.14 + 28.74 = 57.48(c) Why might UPGMA give a different top ology than NJ for these sequences?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 76. Short questions(a) Using a PSSM instead of a single query sequence in a database search can result inimproved retrieval of distantly related motifs. What might this be the case?(b) Explain why we use pseudocounts to corre ct for the zero frequency case when buildingprofiles.(c) What are the properties that HMM’s can capture that PSSM’s c annot?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 87. Define an HMM H with three states {A, B, C}, and alphabet {1, 2, 3}, initial state probabil-ities πA= 1, πB= 0 and πC= 0 and the following transition and emission probabilities:A B C 1 2 3A 0.25 0.25 0.5 0.5 0.25 0.25B 0.5 0.00 0.5 0.5 0.0 0.5C 0.33 0.33 0.33 0.0 0.5 0.5(a) Draw the state diagram of this HMM and show the transition probabilities.(b) Give all the possible state paths for the sequence O = 1, 3, 1.(c) What is P (O|H)?(d) What is the most probable path? Give the probability of O for this path.(e) For this HMM, would the Viterbi algorithm be a good approximation for Forward algo-rithm. Why or why not?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 98. Signal peptides control the entry of proteins into the secretory pathways. These peptides aretypically less than 40 residues long, contain a charged N-terminal region, a central stretchof hydrophobic residues and a cleavage site preceded by three to seven polar residues. Thefollowing HMM to recognize signal peptides, proposed by Nielsen and Krogh, contains ann-region module, an h-region module and a c-region module:(a) Which of the three modules, if any, recognize subsequences with geometric length dis-tributions?(b) What is the minimum length of the n-region sequences this HMM will recognize ?(c) What is the minimum length of the h-region sequences this HMM will recognize?(d) What is the maximum length of the h-region sequences this HMM will r ecognize?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 109. What are the similarities and differences• between Jukes Cantor and K2P?• between Jukes Cantor and PAM?• between PAM and BLOSUM?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 1110. BLASTFor each of the following, state the impact on(a) the spee d of the heuristic(b) the number of false negatives(c) the number of false positivesin the BLAST 97(a) increase/decrease w(b) increase/decrease T(c) increase/decrease A(d) increase/decrease X(e) increase/decrease S03-511/711 Computational Genomics and Molecular Biology, Fall 2009 1211. Give an expression for the probability of a path of length l through the HMM shown below,where q1is the start state and q5is the end state. (Assume an alphabet with one symbol andall states qiemit that symbol with probability equal to one.)q1 q2 q3 q4q51 1−p1−p
View Full Document