DOC PREVIEW
CMU BSC 03711 - Problem

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

03-511/711 Computational Genomics and Molecular Biology, Fall 2002 1Problem Set 3Collaboration is allowed on this homework. You must hand in homeworks individually and list thenames of the people you worked with. Due in class on Thursday, November 12th1. (a) Give biological intuition to explain the fact that the PAM transition matrices are notsymmetrical. Give biological intuition to explain the fact that the PAM log odds scoringmatrices are symmetrical. No mathematics are necessary to answer this question.(b) Let P be the PAM-1 transition matrix and pjbe the background frequency of aminoacid, j. Give an expression for the PAM-120 log odds scoring matrix in terms of P andpj.(c) Can the BLOSUM-62 matrix be derived directly from the BLOSOM-85 matrix? If not,what additional information would you need?03-511/711 Computational Genomics and Molecular Biology, Fall 2002 22. (a) The current size of the GenBank non-redundant protein sequence database is approxmi-nate 390 million residues. Suppose you search this database for high scoring segmentsthat match segments in a query sequence of 400 amino acids. Roughly how many bits ofinformation are needed to distinguish a significant alignment from a chance alignmentin this search?(b) A substitution matrix can be defined in terms of its information content (relative en-tropy). The BLOSUM 60 matrix has roughly 0.66 bits of information per position. Inthe search described above, what is the length of the shortest alignment that can bedistinguished from chance using BLOSUM 60?03-511/711 Computational Genomics and Molecular Biology, Fall 2002 33. The expected number of high scoring pairs (HSP’s) with score at least S found in the align-ment of two random sequences of length m and n isE = Kmne−λSwhere K and λ are constants that can be derived from the theory and depend on the substi-tution matrix. We can define a “normalized” scoreS0=λS − ln Kln 2.Show that the number of HSP’s with score at least S0isE = mn2−S0.03-511/711 Computational Genomics and Molecular Biology, Fall 2002 44. For this problem, you are going to do three BLAST searches and compare the results.These are the basic steps for all three searches:(i) Go to the BLASTP website. The main NCBI web site is linked from the “data andsoftware tools” webpage, as well as the “other resources” webpage. Follow the links tofind BLASTP.(ii) In class, we attempted to do a BLAST search. Our query was the amino acid sequencefor Mouse Tbx-5, a transcription factor involved in limb development. The accession IDfor this sequence is P70326. Enter the accession ID in the search box.(iii) Under “Options for advanced blasting”, use the default for all parameters except asspecified below. Leave the “Format” options unchanged.(iv) Run your BLAST search.(v) In the results window, you’ll see a list of “Sequences producing significant alignments”.For each protein matched, the link on the left leads to the Entrez database recorddescribing this protein. The link on the right shows the local alignment found. You willneed these for the third search only.(vi) At the bottom of the results page, you will see a summary of the BLAST parametersused for this search (beginning with “Database: All non-redundant Genbank CDS...”).You will compare this summary for the three searches.Search 1 Run your search with the default parameters.Search 2 Under “Options for advanced blasting”, change the matrix to PAM 30. Use thedefault for all other parameters.Search 3 Under “Options for advanced blasting”, limit your query to Mus musculus. Resetthe matrix to BLOSUM62. Use the default for all other parameters.Print out the results page for all three searches and hand it in with your problemset. To reduce the output, set the number of alignments to zero under the ”Format” options.You will need to look at the alignments from Search 3 to answer question 4a. For Search3, run your search once with the default ”Format” parameters. After you answer question4a, rerun your BLAST search with the number of alignments set to zero and print out theresults. For the first two searches, you will not need to look at the alignments. Simply setthe number of alignments to zero and print out the BLAST results. That’s all you need toanswer 4b - 4e.03-511/711 Computational Genomics and Molecular Biology, Fall 2002 5(a) Consider the list of matches found in Search 3. In the list of matches, look at the Entrezrecords and alignments for every match with an E-value greater than (i.e., less significantthan) e−40. Compare them with a few matches with higher significance. Based on whatyou see, was the default significance threshold E = 10 too low, too high or just right.Why?Compare the parameter summaries from all three BLAST searches to answer the follow-ing questions:(b) How many matches (sequences better than 10.0) were found in each search? Explain therelative quantities in terms of the parameters of the three searches.(c) Compare the number of extensions in Search 1 and Search 2. Which search has moreand why?(d) Compare the values of gapped Lambda and K for all three searches. Which parametershave the largest impact in determining the values of these constants? Should the valuesof Lambda and K be recomputed as the database grows? When the scoring matrix ischanged? With the length of the query sequence?(e) Compare the values of the score thresholds S2 in the three searches. Do they differ?What is the justification for this?03-511/711 Computational Genomics and Molecular Biology, Fall 2002 65. Consider the set of BLAST programs offered on the main BLAST page at NCBI. WhichBLAST program and which database would you use for each of the following applications?(There may be more than one reasonable answer for each question. Give at least one.) Givea one sentence justification for your answer in each case.You will find the online BLAST Program Selection Guide helpful in answering this question.(a) You have a query sequence about which you know nothing, except that it is 333 nu-cleotides long. You want to identify it.(b) Your query sequence is a mouse protein roughly 250 residues long. You want to find thesequence for the gene that encodes it.03-511/711 Computational Genomics and Molecular Biology, Fall 2002 7(c) Your query sequence is a human nucleotide sequence which you suspect may contain agene. You wish to identify candidate exons.(d) Your query sequence is a zebrafish nucleotide sequence which you


View Full Document

CMU BSC 03711 - Problem

Documents in this Course
lecture

lecture

8 pages

Lecture

Lecture

3 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

7 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

21 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

lecture

lecture

11 pages

Lecture

Lecture

8 pages

Lecture

Lecture

9 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Homework

Homework

10 pages

Lecture

Lecture

9 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

6 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Problem

Problem

11 pages

Notes

Notes

7 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Homework

Homework

15 pages

Lecture

Lecture

16 pages

Problem

Problem

15 pages

Load more
Download Problem
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Problem and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Problem 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?