DOC PREVIEW
CMU BSC 03711 - Problem

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

03-511/711 Computational Genomics and Molecular Biology, Fall 2011 1Problem Set 3b Due December 3rdCollaboration is allowed on this homework. You must hand in homeworks individually andlist the names of the people you worked with. Turn in your handwritten answers on the attachedsheets.Monellin is one of several intensely sweet proteins that have been discovered in nature. It ismade up of two short peptide chains (A and B) and is found in the fruit of Dioscoreophyllumcumminsii, a West African plant also known as the serendipity berry. Monellin appears to bedistantly related to cystatins, a family of cysteine protein inhibitors.Monellin is a challenging query (1) because it is short and (2) because the sequence divergencebetween monellin and the cystatins is substantial. You will conduct four searches with this queryusing different parameter values.These are the basic steps for all four searches:1. Go to the BLASTP web site. The BLAST home page is linked off the course syllabus site.Follow the links to find protein-protein BLAST.2. Enter P02882.2, the accession ID for Monellin Chain B, in the search box.3. For all searches, set the following parameters:• Under “Choose search set”, select “Non-redundant protein sequences (nr)” (the default).• Under “Algorithm Parameters,” set “Expect threshold” to 200 to make sure you don’tmiss any matches;• Uncheck “Automatically adjust parameters for short input sequences”;• Set “Max Target Sequences” to 500 to make sure you don’t miss any matches;• Set “Compositional adjustments” to “No adjustment”.• Uncheck “Filter for low complexity regions”;• Check “Show results in a new window” so that you can use the same query page for allfour searches.• Use the default for all other parameters, except as specified below.4. Run each of the four searches specified below.5. Once each search is completed, click on “formatting options” at the top of the results window.On the first line, change “HTML” to “Plain text” in the second pull-down menu and check“Use old BLAST report format”. Set “Alignments” to 0. Click “Reformat”. If you donot set these formatting options correctly, you will get incorrect information or some of theinformation you need may not be reported.03-511/711 Computational Genomics and Molecular Biology, Fall 2011 26. For each search, print out the results page and hand it in with your problem set.To reduce the amount of output you need to print, make sure that “Alignments”is set to zero under the “Format” options.7. In the reformatted output, you’ll see a list of “Sequences producing significant alignments”.For each sequence matched, you will see the database id assigned to this protein, a shortone-line description of the protein, the normalized bit score for the match, and the E-valuefor the match.At the bottom of the results page, you will see a summary of the BLAST parameters usedfor this search (beginning with “Database: All non-redundant ...”). You will compare thisinformation for the four searches.Search 1 Use the default for all parameters not specified above.Search 2 Under “Algorithm parameters”, change the matrix to BLOSUM80. Otherwise, usethe same parameters as in Search 1.Search 3 Under “Algorithm parameters”, set the matrix to PAM30. Otherwise, use the sameparameters as in Search 1.Search 4 Under “Choose search set”, enter Plants in the “Organism” box. Reset the substi-tution matrix to BLOSUM62. Otherwise, use the same parameters as in Search 1.1. For each search, make a table containing the following values:• The matrix used• The length of the database. (Careful, this is not the same as the effective length of thedatabase.)• The length of the query. (Again, not the effective length).• Record the bit score and the E value for Monnelin Chain B (P02882.2); i.e., for thequery matching with itself.• Search for sequence identifier Q10Q47.1, which is a cystatincx. Record the bit score andthe E value for this match.03-511/711 Computational Genomics and Molecular Biology, Fall 2011 32. Information content:(a) For each search, calculate the minimum number of bits needed to distinguish a significantalignment from chance.(b) For each search, estimate the minimum query length needed to achieve the number ofbits you calculated in (i).(c) For Searches 2, 3 and 4, is the minimum number of bits required different than theminimum of number bits required for Search 1? In each case, explain why (or why not).(d) For which searches, if any, is the query sequence long enough to find significant matches,according to the theory? What characteristic of these searches is responsible for this?Explain your reasoning.03-511/711 Computational Genomics and Molecular Biology, Fall 2011 43. Factors that influence bit score and E value(a) Compare the bit score of sequence Q10Q47.1 in Searches 2, 3 and 4, with the bit scoreof Q10Q47.1 in Search 1. Did it increase, decrease or remain unchanged? In each case,explain what you observe in terms of the parameters of the search and what you knowabout the properties of the bit score.(b) Compare the E value of sequence Q10Q47.1 in Searches 2, 3 and 4, with the E valueof Q10Q47.1 in Search 1. Did it increase, decrease or remain unchanged? What is therelationship between changes (or lack thereof) in bit score and E value? In each case,explain what you observe in terms of the parameters of the search and what you knowabout the properties of bit score and E values.(c) Compare the bit score of sequence P02882.2 in Searches 2, 3 and 4, with the bit scoreof P02882.2 in Search 1. Did it increase, decrease or remain unchanged? In each case,explain what you observe in terms of the parameters of the search and what you knowabout the properties of the bit score.03-511/711 Computational Genomics and Molecular Biology, Fall 2011 5(d) Compare the E value of sequence P02882.2 in Searches 2, 3 and 4, with the E valueof P02882.2 in Search 1. Did it increase, decrease or remain unchanged? What is therelationship between changes (or lack thereof) in bit score and E value? In each case,explain what you observe in terms of the parameters of the search and what you knowabout the properties of bit score and E values.(e) How many matches rank higher (are more significant) than Q10Q47.1 in Search 2? Doyou think these higher ranking matches are all true positives? Why or why not?(f) How many matches rank higher (are more


View Full Document

CMU BSC 03711 - Problem

Documents in this Course
lecture

lecture

8 pages

Lecture

Lecture

3 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

7 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

21 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

lecture

lecture

11 pages

Lecture

Lecture

8 pages

Lecture

Lecture

9 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Homework

Homework

10 pages

Lecture

Lecture

9 pages

Problem

Problem

7 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Problem

Problem

11 pages

Notes

Notes

7 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Homework

Homework

15 pages

Lecture

Lecture

16 pages

Problem

Problem

15 pages

Load more
Download Problem
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Problem and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Problem 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?