DOC PREVIEW
CMU BSC 03711 - Problem

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Fall 2009 Computational Genomics and Molecular Biology 1Problem Set 4Collab oration is allowed on this homework. You must hand in homeworks individually and list the namesof the people you worked with. Due Thurssday, December 3rd1. (a) Verify that the rows of the PAM 1 transition matrix sum to one.(b) Verify thatPipiP1[i, i] = 0.99(c) Verify that S[j, k] = S[k, j], where S[·, ·] is the PAM-1 log odds scoring matrix.Fall 2009 Computational Genomics and Molecular Biology 22. In this problem, you will construct a BLOSUM60 substitution matrix from the following aligned block:1: DSDQQD2: DSSQQD3: SSQQDD4: DDQQDD(a) Determine the percent identity between all possible pairs of sequences.(b) Cluster the sequences such that each sequence in the cluster is at least 60% identical to someother sequence in the cluster.Fall 2009 Computational Genomics and Molecular Biology 3(c) Calculate the observed frequencies (axy) for the clustered sequences, using the BLOSUM methodfor adjusting for cluster size.(d) Calculate the expected frequencies (axy) for the clustered sequences, using the BLOSUM methodfor adjusting for cluster size.(e) Use these frequencies to obtain the log odds matrix, as defined by Henikoff and Henikoff.Fall 2009 Computational Genomics and Molecular Biology 43. Substitution matrices:(a) Both the PAM and the BLOSUM substitution matrix families are parametrized by evolutionarydivergence. Which repres ents a greater degree of divergence, BLOSUM80 or BLOSUM62? Why?(b) Which represents a greater degree of divergence, BLOSUM62 or PAM40? Why?(c) What is the interpretation of a positive value in Sx[i, j], the PAM x log odds scoring matrix fora given pair of amino acids i, j?(d) What is the interpretation of a negative value in Sx[i, j]?(e) Consider the PAM30 and PAM250 matrices (shown on the web site). What is the average valueon the diagonal of the PAM 30 matrix (i.e., the average of S30[i, i] over all values of i)?(f) What is the average value on the diagonal of the PAM 250 matrix?Fall 2009 Computational Genomics and Molecular Biology 5(g) Which average diagonal value is larger? How would you explain this in terms of the evolutionarydivergence associated with each of the matrices?(h) Which specific diagonal values are larger in PAM250 than in PAM30? That is, for which aminoacids, i, is S250[i, i] > S30[i, i]? What does that suggest about the functional or structuralproperties of i?Fall 2009 Computational Genomics and Molecular Biology 64. Serine and threonine (S and T) are small, hydrophilic amino acids; asparagine, aspartic acid, glutamicacid, and glutamine (N, D, E, and Q) are large, hydrophilic amino acids; and methionine, isoleucine,leucine and valine (M, I, L, and V) are small, hydrophobic amino acids. Based on the entries in thePAM 250 matrix, which of the following substitutions are you more likely to observe in highly divergedsequences? Show the evidence on which you base your answer. Which property do you think is moreimportant to protein structure: size or hydrophobicity?(a) The replacement of a small, hydrophilic amino acid with a small, hydrophobic amino acid.(b) The r eplacement of a small, hydrophilic amino acid with a large, hydrophilic amino acid.Fall 2009 Computational Genomics and Molecular Biology 75. For ungapped alignments, the expected number of high scoring pairs (HSP’s) with score at least Sfound in the alignment of two random sequences of length m and n isE = Kmne−λSwhere K and λ are constants that can be derived from the theory and depend on the substitutionmatrix. We can define a “normalized” scoreS′=λS − ln Kln 2.(a) Show that the number of HSP’s with score at least S′isE = mn2−S′(b) Derive an expression for S′in terms of E.Fall 2009 Computational Genomics and Molecular Biology 86. Blast problem 1: For this problem, we will search with the sequence of Keratin 18, which is a memberof the Intermediate Filament family. You will perform three BLAST searches with different parametersettings and compare the results.These are the basic steps for all three searches:(i) Go to the BLASTP web site. The BLAST home page is linked off the course syllabus site. Followthe links to find protein-protein BLAST.(ii) The accession ID for Keratin 18 amino acid sequence in this problem is NP000215.1 Enter theaccession ID in the search box.(iii) For all searches, set the following parameters:• Under “Choose s earch set”, select “Non-redundant protein sequences (nr)”.• Under “Organism”, select “Lagomorpha (taxid:9975)”.• Under “Algorithm Parameters,” set “Expect” to 1;• Uncheck “Automatically adjust parameters for short input sequences”;• Set max target sequences to 250;• Set “Compositional adjustments” to “No adjustment”.• Uncheck “Filter for low complexity regions”;• Check “Show results in a new window” so that you can use the same query page for all threesearches.• Use the default for all other parameters, except as specified below.(iv) Run each of the three searches specified below.(v) Once each search is completed, click on “formatting options” at the top of the results window.Select “Use old BLAST report format”. Set “Graphical overview” to 250 and “Alignments” to0 (“Descriptions” should already be set to 250.) Click “Reformat”. If you do not set theseformatting options correctly, you will get incorrect information or some of the information youneed may not be reported.(vi) For each search, print out the results page and hand it in with your problem set. Toreduce the amount of output you need to print, make sure that “Alignments” is setto zero under the “Format” options.(vii) In the reformatted output, you’ll see a color diagram with entitled “Distribution of XXX BlastHits on the Query Sequence.” XXX is the number of matches you obtained. (Note that the website uses the word “hits” ambiguously. I use “matches” to refer to sequences reported in the finaloutput of the search and “hits” to refer to word pairs.)Below that, you’ll see a list of “Sequences producing significant alignments”. For each proteinmatched, you will see a link to the Entrez database record describing this protein, a short one-linedescription of the protein, the normalized bit score for the match (i.e., the equation you derivedin problem 4) and the E-value for the match.At the bottom of the results page, you will see a summary of the BLAST parameters used forthis


View Full Document

CMU BSC 03711 - Problem

Documents in this Course
lecture

lecture

8 pages

Lecture

Lecture

3 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

7 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

21 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

lecture

lecture

11 pages

Lecture

Lecture

8 pages

Lecture

Lecture

9 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Homework

Homework

10 pages

Lecture

Lecture

9 pages

Problem

Problem

7 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

6 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Problem

Problem

11 pages

Notes

Notes

7 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Homework

Homework

15 pages

Lecture

Lecture

16 pages

Load more
Download Problem
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Problem and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Problem 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?