DOC PREVIEW
CMU BSC 03711 - Homework

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

03-511/711 Computational Genomics and Molecular Biology, Fall 2010 1Problem Set 4 Due December 2ndCollaboration is allowed on this homework. You must hand in home works individually andlist the names of the people you worked with. Turn in your handwritten answers on the attachedsheets.1. (a) Verify that the rows of the PAM 1 transition matrix sum to one.(b) Verify thatPipiP1[i, i] = 0.99(c) Verify that S[j, k] = S[k, j], where S[·, ·] is the PAM-1 log odds scoring matrix.03-511/711 Computational Genomics and Molecular Biology, Fall 2010 22. The following multiple alignment is a short, variable length motif found in mannose-6-phosphate receptor binding proteins:TVQ-VMS--SVMT--PVLS--LVMNINT-L(a) Suppose you want to construct a profile HMM for this motif. You decide it should havefour Match states (not including silent start and end states). Why is four a reasonablechoice?(b) Label each column in the alignment with the state associated with it.TVQ-VMS--SVMT--PVLS--LVMNINT-L03-511/711 Computational Genomics and Molecular Biology, Fall 2010 3(c) Present the sequences in the labeled multiple alignment as four separate training se-quences, each of which has been labeled with its associated states from your HMM.(d) You want to estimate the parameters, transition and emission probabilities, of the profileHMM from the sequences labeled with their states part c. Give the estimates for thefollowing two parameters, using pseudocounts (b = 1) for both parameters.i. eM1(N)ii. aI1M203-511/711 Computational Genomics and Molecular Biology, Fall 2010 43. In this problem, you will construct a BLOSUM substitution matrices from these two alignedblocks:1A: FW2A: FW3A: WW4A: WW1B: FWYW2B: FYYW3B: FFYY4B: FFWW(a) Determine the percent identity between all possible pairs of sequences.(b) Cluster the sequences such that each sequence in the cluster is at least 70% identical tosome other sequence in the cluster. For each block, show the set of sequences in eachcluster.(c) Calculate the observed frequencies for FF and FW (a70F Fand a70F W) for the clusteredsequences, using the BLOSUM method for adjusting for cluster size.03-511/711 Computational Genomics and Molecular Biology, Fall 2010 5(d) Calculate the expected frequencies for FF and FW (E70F Fand E70F W) for the clusteredsequences, using the BLOSUM method for adjusting for cluster size.(e) Use these frequencies to obtain the log odds matrix entries for FF and FW (S70[F F ]and S70[F W ]), as defined in the BLOSUM framework.(f) Cluster the sequences such that each sequence in the cluster is at least 80% identical tosome other sequence in the cluster. For each block, show the set of sequences in eachcluster.03-511/711 Computational Genomics and Molecular Biology, Fall 2010 6(g) Calculate the observed frequencies for FF and FW (a80F Fand a80F W) for the clusteredsequences, using the BLOSUM method for adjusting for cluster size.(h) Calculate the expected frequencies for FF and FW (E80F Fand E80F W) for the clusteredsequences, using the BLOSUM method for adjusting for cluster size.(i) Use these frequencies to obtain the log odds matrix entries for FF and FW (S80[F F ]and S80[F W ]), as defined in the BLOSUM framework.(j) Compare your results for the 70% and 80% thresholds. For this data set, does S[F, F ]increase or decrease as the threshold increases? Does S[F, W ] increase or decrease asthe threshold increases? How would you explain the trends you observe in terms of theprocesses of sequence evolution?03-511/711 Computational Genomics and Molecular Biology, Fall 2010 74. Consider the BLOSUM80 and BLOSUM45 matrices (shown on the web site).(a) According to the BLOSUM45 matrix, which of the six biochemical groups (sulfhydryl;small, hydrophobic; small, hydrophilic; large, acidic and hydrophilic; aromatic; basic)is most tolerant, on average, of amino acid substitutions within the same biochemicalgroup? Which is least tolerant? Justify your answer numerically.(b) Both the PAM and the BLOSUM substitution matrix families are parameterized byevolutionary divergence. Which represents a greater degree of divergence, BLOSUM80or BLOSUM45?(c) For e ach of the two matrices, what is the mean score for s ubstituting a small, hydrophilicamino acid with a large, hydrophilic amino acid? Compare the mean scores and explainwhy they differ in terms of the evolutionary divergence of the two matrices .(d) For the BLOSUM45 matrix, what is the mean score for substituting a small, hydrophilicamino acid with a small, hydrophobic amino acid. Compare your result with the meanBLOSUM45 sc ore you obtained in the previous question. Explain what you observe interms of the biophysical properties of the residues.03-511/711 Computational Genomics and Molecular Biology, Fall 2010 85. For ungapped alignments, the e xpected number of high scoring pairs (HSP’s) with score atleast S found in the alignment of two random sequences of length m and n isE = Kmne−λSwhere K and λ are constants that can be derived from the theory and depend on the substi-tution matrix. Blast reports a “normalized” score called the bit score:S0=λS − ln Kln 2.The bit score can be thought of as the amount of information, in bits, associated with a givenpairwise alignment. We will use the following in results in Problem 6:(a) Show that the number of HSP’s with score at least S0isE = mn2−S0(b) Derive an expression for S0in terms of E.03-511/711 Computational Genomics and Molecular Biology, Fall 2010 96. Blast: The male platypus Ornithorhynchus anatinus emits venom from the spurs on itshind legs. For this problem, we will search with a constituent of platypus venom called“Ornithorhynchus venom defensin-like peptide C” (OvDLP-C). This venom protein is believedto have evolved from the β defensins, a family of proteins with inate immune functions inmammals.OvDLP-C is a challenging query (1) because it is short and (2) because the β defensin familyis highly divergent. You will conduct four searches with this query using different parametervalues.These are the basic steps for all four searches:(i) Go to the BLASTP web site. The BLAST home page is linked off the course syllabussite. Follow the links to find protein-protein BLAST.(ii) Enter the accession ID P82172.2 in the search box.(iii) For all searches, set the following parameters:• Under “Choose search set”, select “Swissprot”.• Under “Algorithm Parameters,” set “Expect threshold” to 500;• Uncheck “Automatically adjust parameters


View Full Document

CMU BSC 03711 - Homework

Documents in this Course
lecture

lecture

8 pages

Lecture

Lecture

3 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

7 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

21 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

lecture

lecture

11 pages

Lecture

Lecture

8 pages

Lecture

Lecture

9 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Homework

Homework

10 pages

Lecture

Lecture

9 pages

Problem

Problem

7 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

6 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Problem

Problem

11 pages

Notes

Notes

7 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Lecture

Lecture

16 pages

Problem

Problem

15 pages

Load more
Download Homework
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Homework and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Homework 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?