DOC PREVIEW
CMU BSC 03711 - Study questions

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

03-511/711 Computational Genomics and Molecular Biology, Fall 20111Study questionsThese study problems are intended to help you to review for the final exam. This is not an exhaustive listof the topics covered in the class and there is no guarantee that these questions are representative of thequestions on the final exam. You should also review your class notes, the syllabus and your home workassignments.1. Pairwise sequence alignment(a) Compare, dynamic programming for local, global and semiglobal alignment in terms of whenthey should be used; differences in initialization and termination condition; recursion relation,which scoring functions are suitable.2. Multiple sequence alignment(a) What are the differences between the exact dynamic programming for multiple sequence align-ment and the progressive alignment heuristic?(b) Score the following multiple sequence alignment using sum-of-pairs. Compare the scores andexplain why they differ?AGGAT03-511/711 Computational Genomics and Molecular Biology, Fall 201123. Phylogeny reconstruction(a) Concepts:i. Taxa, leaf nodes, internal nodes, root, branch lengths.ii. Rooted versus unrooted trees; what methods are used to root a tree? For each method, underwhat circumstances can they be applied?iii. Gene trees versus species treesiv. How many trees with k leaves?v. How to enumerate trees and/or traverse tree space.vi. Models of sequence substitution, what are they how are they used?vii. Character data: characters and character statesviii. Distance data: how distances are derived from sequence data; why it is necesary to correctsuch distances; additive and ultrametric distances.(b) For each of the phylogeny reconstruction methods listed below,i. Which types of information can they infer?ii. If a method cannot infer all of the types listed, what biological or algorithmic assumptionsimpose these limitations?iii. Which reconstruction methods are appropriate for each of the types of data listed above?Why?Phylogeny reconstruction methods• UPGMA• Neighbor Joining• Maximum parsimony• Maximum likelihoodInformation inferred by such methods• unrooted tree topology• rooted tree topology• branch lengths,• labels on internal nodesSequences• nucleotide• amino acid03-511/711 Computational Genomics and Molecular Biology, Fall 20113• evolving rapidly• under selective pressure• evolving at a constant rate in all lineages4. Review Fitch’s algorithm.(a) What does it do?(b) How does it work?(c) What can’t it do?5. Tree reconstructionBelow you see trees constructed using 5S small subunit ribosomal RNA genes from four bacterialspecies:M10815 Bacillus subtilis (BS)X02713 Lactobacillus viridescene (LV)M58416 Clostridium tyrobutyicum (CT)K02683 Micrococcus luteus (ML)UPGMA Tree:-------------------| || || || 13.14| || || |28.74 ---- ----| | || | 7.45| | || | -- --| 15.6 | || | 8.15 8.15| | | |ML LV BS CT03-511/711 Computational Genomics and Molecular Biology, Fall 20114Neighbor-Joining tree:CT//BS /10.872.08\ _ 1.86_ _// \/ \ 43.75/ \24.55 / \/ \/ \/ \/ \LV \\\\\\\\MT(a) Are the topologies the same? If not, how do they differ?03-511/711 Computational Genomics and Molecular Biology, Fall 20115(b) In which tree , is the distance from Lactobacillis viridescens to Micrococcus luteus most dis-torted with respect to the original distance matrix derived from the multiple sequence alignment?DISTANCE MATRIX FROM OUTPUTKey for column and row indices:1 BSubtilis2 CTyro3 LViridescens4 MLuteusMatrix 1: Part 11 2 3 4--------------------------------------------------| 1 | 0.00 16.31 26.63 46.18| 2 | 0.00 35.77 54.62| 3 | 0.00 71.66| 4 | 0.00Distance derived from the data: 71.66Distance derived from the NJ tree: 43.75 + 1.86 + 24.55 = 70.16Distance derived from the UPGMA: 15.60 + 13.14 + 28.74 = 57.48(c) Why might UPGMA give a different topology than NJ for these sequences?03-511/711 Computational Genomics and Molecular Biology, Fall 201166. Short questions(a) Using a PSSM instead of a single query sequence in a database search can result in improvedretrieval of distantly related motifs. What might this be the case?(b) Explain why we use pseudocounts to correct for the zero frequency case when building profiles.(c) What are the properties that HMM’s can capture that PSSM’s cannot?(d) How would you use an HMM to calculate a global multiple alignment?(e) What is meant by “unlabeled” data? By “labeled” data?(f) Generally, it is believed that the BLOSUM matrices give better results than the PAM matri-ces. Nevertheless, PAM30 and PAM70 are still used when searching for very closely relatedsequences. Why?(g) What is model surgery? How would you determine whether model surgery is required? If it isrequired, what steps would you take to modify your HMM?(h) Consider the profile HMM shown below:i. M0and M6are silent start and end states. Since these states do not emit a symbol, whatpurpose do they serve?ii. What is the purpose of the I0state?03-511/711 Computational Genomics and Molecular Biology, Fall 201177. Define an HMM H with three states {A, B, C}, and alphabet {1, 2, 3}, initial state probabilities πA=1, πB= 0 and πC= 0 and the following transition and emission probabilities:A B C 1 2 3A 0.25 0.25 0.5 0.5 0.25 0.25B 0.5 0.00 0.5 0.5 0.0 0.5C 0.33 0.33 0.33 0.0 0.5 0.5(a) Draw the state diagram of this HMM and show the transition probabilities.(b) Give all the possible state paths for the sequence O = 1, 3, 1.(c) What is P(O|H )?(d) What is the most probable path? Give the probability of O for this path.(e) For this HMM, would the Viterbi algorithm be a good approximation for Forward algorithm.Why or why not?03-511/711 Computational Genomics and Molecular Biology, Fall 201188. Signal peptides control the entry of proteins into the secretory pathways. These peptides are typi-cally less than 40 residues long, contain a charged N-terminal region, a central stretch of hydrophobicresidues and a cleavage site preceded by three to seven polar residues. The following HMM to rec-ognize signal peptides, proposed by Nielsen and Krogh, contains an n-region module, an h-regionmodule and a c-region module:(a) Which of the three modules, if any, recognize subsequences with geometric length distributions?(b) What is the minimum length of the n-region sequences this HMM will recognize?(c) What is the minimum length of the h-region sequences this HMM will recognize?(d) What is the maximum length of the h-region sequences this HMM will recognize?03-511/711


View Full Document

CMU BSC 03711 - Study questions

Documents in this Course
lecture

lecture

8 pages

Lecture

Lecture

3 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

7 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

21 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

lecture

lecture

11 pages

Lecture

Lecture

8 pages

Lecture

Lecture

9 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Homework

Homework

10 pages

Lecture

Lecture

9 pages

Problem

Problem

7 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

6 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Problem

Problem

11 pages

Notes

Notes

7 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Homework

Homework

15 pages

Lecture

Lecture

16 pages

Problem

Problem

15 pages

Load more
Download Study questions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study questions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study questions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?