DOC PREVIEW
CMU BSC 03711 - Study questions

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

03-511/711 Computational Genomics and Molecular Biology, Fall 2009 1Study questionsThese study problems are intended to help you to review for the final exam. This is by no meanan exhaustive list of the topics covered in the class and there is no guarantee that these questionsare representative of the questions on the final exam. You should also re view your class notes, thesyllabus and your home work assignments.1. Pairwise sequence alignment(a) Compare, dynamic programming for local, global and semiglobal alignment in termsof when they should be used; differences in initialization and termination condition;recursion relation, which scoring functions are suitable.2. Multiple sequence alignment(a) What are the differences between the exact dynamic programming for multiple sequencealignment and the progressive alignment heuristic?(b) Score the following multiple sequence alignment using sum-of-pairs and tree alignment.Compare the scores and explain why they differ?AGGAT03-511/711 Computational Genomics and Molecular Biology, Fall 2009 23. For each of the phylogeny reconstruction methods listed below,(a) which typ e s of information can they infer?(b) If a method cannot infer all of the types listed, what biological or algorithmic assumptionsimpose these limitations?(c) Which reconstruction methods are appropriate for each of the types of data listed above?Why?Phylogeny reconstruction methods• UPGMA• Neighbor Joining• minimum evolution• maximum parsimony• maximum likelihoodInformation inferred by such methods• unrooted tree topology• rooted tree topology• branch lengths,• labels on internal nodesSequences• nucleotide• amino acid• evolving rapidly• under selective pressure• evolving at a constant rate in all lineages03-511/711 Computational Genomics and Molecular Biology, Fall 2009 34. Review Fitch’s algorithm.(a) What does it do?(b) How do es it work?(c) What can’t it do?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 45. Tree reconstructionBelow you see trees constructed using 5S small subunit ribosomal RNA genes from fourbacterial spec ies:M10815 Bacillus subtilis (BS)X02713 Lactobacillus viridescene (LV)M58416 Clostridium tyrobutyicum (CT)K02683 Micrococcus luteus (ML)UPGMA Tree:-------------------| || || || 13.14| || || |28.74 ---- ----| | || | 7.45| | || | -- --| 15.6 | || | 8.15 8.15| | | |ML LV BS CT03-511/711 Computational Genomics and Molecular Biology, Fall 2009 5Neighbor-Joining tree:CT//BS /10.872.08\ _ 1.86_ _// \/ \ 43.75/ \24.55 / \/ \/ \/ \/ \LV \\\\\\\\MT(a) Are the topologies the same? If not, how do they differ?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 6(b) In which tree , is the distance from Lactobacillis viridescens to Micrococcus luteus mostdistorted with respect to the original distance matrix derived from the multiple sequencealignment?DISTANCE MATRIX FROM OUTPUTKey for column and row indices:1 BSubtilis2 CTyro3 LViridescens4 MLuteusMatrix 1: Part 11 2 3 4--------------------------------------------------| 1 | 0.00 16.31 26.63 46.18| 2 | 0.00 35.77 54.62| 3 | 0.00 71.66| 4 | 0.00Distance derived from the data: 71.66Distance derived from the NJ tree: 43.75 + 1.86 + 24.55 = 70.16Distance derived from the UPGMA: 15.60 + 13.14 + 28.74 = 57.48(c) Why might UPGMA give a different top ology than NJ for these sequences?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 76. Short questions(a) Using a PSSM instead of a single query sequence in a database search can result inimproved retrieval of distantly related motifs. What might this be the case?(b) Explain why we use pseudocounts to corre ct for the zero frequency case when buildingprofiles.(c) What are the properties that HMM’s can capture that PSSM’s c annot?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 87. Define an HMM H with three states {A, B, C}, and alphabet {1, 2, 3}, initial state probabil-ities πA= 1, πB= 0 and πC= 0 and the following transition and emission probabilities:A B C 1 2 3A 0.25 0.25 0.5 0.5 0.25 0.25B 0.5 0.00 0.5 0.5 0.0 0.5C 0.33 0.33 0.33 0.0 0.5 0.5(a) Draw the state diagram of this HMM and show the transition probabilities.(b) Give all the possible state paths for the sequence O = 1, 3, 1.(c) What is P (O|H)?(d) What is the most probable path? Give the probability of O for this path.(e) For this HMM, would the Viterbi algorithm be a good approximation for Forward algo-rithm. Why or why not?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 98. Signal peptides control the entry of proteins into the secretory pathways. These peptides aretypically less than 40 residues long, contain a charged N-terminal region, a central stretchof hydrophobic residues and a cleavage site preceded by three to seven polar residues. Thefollowing HMM to recognize signal peptides, proposed by Nielsen and Krogh, contains ann-region module, an h-region module and a c-region module:(a) Which of the three modules, if any, recognize subsequences with geometric length dis-tributions?(b) What is the minimum length of the n-region sequences this HMM will recognize ?(c) What is the minimum length of the h-region sequences this HMM will recognize?(d) What is the maximum length of the h-region sequences this HMM will r ecognize?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 109. What are the similarities and differences• between Jukes Cantor and K2P?• between Jukes Cantor and PAM?• between PAM and BLOSUM?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 1110. BLASTFor each of the following, state the impact on(a) the spee d of the heuristic(b) the number of false negatives(c) the number of false positivesin the BLAST 97(a) increase/decrease w(b) increase/decrease T(c) increase/decrease A(d) increase/decrease X(e) increase/decrease S03-511/711 Computational Genomics and Molecular Biology, Fall 2009 1211. Give an expression for the probability of a path of length l through the HMM shown below,where q1is the start state and q5is the end state. (Assume an alphabet with one symbol andall states qiemit that symbol with probability equal to one.)q1 q2 q3 q4q51 1−p1−p


View Full Document

CMU BSC 03711 - Study questions

Documents in this Course
lecture

lecture

8 pages

Lecture

Lecture

3 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

7 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

21 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

lecture

lecture

11 pages

Lecture

Lecture

8 pages

Lecture

Lecture

9 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Homework

Homework

10 pages

Lecture

Lecture

9 pages

Problem

Problem

7 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

6 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Problem

Problem

11 pages

Notes

Notes

7 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Homework

Homework

15 pages

Lecture

Lecture

16 pages

Problem

Problem

15 pages

Load more
Download Study questions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study questions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study questions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?