03-511/711 Computational Genomics and Molecular Biology, Fall 2005 1Problem Set 1 Due Thursday, September 27thCollaboration is allowed on this homework. You must hand in homework assignments individ-ually and list the names of the people you worked with.You may not use a program to do this homework. Turn in your handwritten answers on theattached sheets. Extra alignment templates will be available on the website.1. Pairwise alignment:(a) Compute the local alignment of “TANGENT with “ENTANGLEMENT”, using the fol-lowing scoring system: matches = 1, mismatches = -2, indels = -2. Show your alignmentmatrix with scores and traceback on the attached alignment template.(b) What is the score of the optimal local alignment? How many different optimal alignmentsare there? Give the alignment(s) here.(c) In the space below, show all suboptimal alignments that have a score one less than theoptimal score. How many are there? Do any of them occur more than once in thealignment matrix?03-511/711 Computational Genomics and Molecular Biology, Fall 2005 22. Pairwise alignment:(a) Compute the local alignment of “TANGENT with “ENTANGLEMENT”, this time usingthe following scoring system: matches = 2, mismatches = -1, indels = -1. Show youralignment matrix with scores and traceback on the attached alignment template.(b) What is the score of the optimal local alignment? How many different optimal alignmentsare there? Show them.03-511/711 Computational Genomics and Molecular Biology, Fall 2005 33. Semiglobal alignment:(a) Compute the semiglobal alignment of “TANGENT with “ENTANGLEMENT”, usingthe following scoring system: matches = 3, mismatches = -2, indels = -2, with nopenalty for aligning indels with the leading characters of “ENTANGLEMENT”. (Thatis, indels can be inserted at the beginning of “TANGENT” with no cost.) Assume apenalty for gaps at the end of both sequences.Show your alignment matrix with scores and traceback on the template provided. Toreduce the amount of work you have to do, low scoring areas of the matrix have beenblacked out on the template. You do not need to fill in those cells.(b) What is the score of the optimal semiglobal alignment? How many different optimalalignments are there? Show them.03-511/711 Computational Genomics and Molecular Biology, Fall 2005 44. Comparing semiglobal and local alignments(a) Did you get the same optimal local alignments in Problems 1 and 2? Why or why not?(b) Compare the local alignments from Problems 1 and 2 with the semiglobal alignmentyou obtained in Problem 3. Under what conditions, will local alignment give the sameresults as semi-global alignment for this problem?(c) Although it sometimes possible to obtain the same results with local and global align-ment, they are fundamentally different methods and will not in general give the sameresult. Why?03-511/711 Computational Genomics and Molecular Biology, Fall 2005 55. Scoring functions:(a) What four properties must a scoring function have to be appropriate for local alignment?Consider the following scoring functions, where M designates a match, m designates a mis-match and g designates an indel. For each of the scoring functions, state whether it is suitablefor pairwise local alignment. If it is not suitable, why not?(b) M = 2, m = 2, g = 2.(c) M = 2, m = −2, g = −2.(d) M = 3, m = 0, g = 0.(e) M = 5, m = −5, g = −2.(f) M = −1, m = −2, g = −2.(g) M = 0, m = 1, g = 1.03-511/711 Computational Genomics and Molecular Biology, Fall 2005 66. You are given two sequences U = u1, u2, . . . , umand T = t1, t2, . . . , tn. For each of thefollowing problems, state(i) which algorithm you would use to solve it: global alignment, local alignment or semi-global alignment.(ii) initialization: show the values in the first row and first column in the dynamic program-ming matrix, a[ ].(iii) give the recurrence relation required to fill in the dynamic programming matrix a[i, j], interms of a gap cost, g, and a scoring matrix, S[i, j], which gives the score of matches andmismatches. State the constraints, if any, that must be imposed on the scoring function,S[ ].(iv) an expression in terms of a[ ], m and n for determining the score of the optimal alignment.(a) As a preprocessing step for sequence assembly, it is necessary to identify pairs of DNAsequence fragments that overlap. Let T and U be two sequence fragments. We wish todetermine whether the end of T overlaps with the beginning of U .i. Algorithm:ii. Initialization:t1t2t3. . .u1u2u3. . .iii. The recurrence relation; constraints on S[ ], if any.iv. The score of the optimal alignment.03-511/711 Computational Genomics and Molecular Biology, Fall 2005 7(b) Given the operations insertion, deletion and substitution, the problem is to determinethe minimum number of operations required to transform T into U .i. Algorithm:ii. Initialization:t1t2t3. . .u1u2u3. . .iii. The recurrence relation; constraints on S[ ], if any.iv. The score of the optimal alignment.03-511/711 Computational Genomics and Molecular Biology, Fall 2005 8(c) T is a cDNA sequence and U is a genomic sequence, both taken from the same bacterialgenome. We wish to find the location of the gene that encoded the cDNA.i. Algorithm:ii. Initialization:t1t2t3. . .u1u2u3. . .iii. The recurrence relation; constraints on S[ ], if any.iv. The score of the optimal alignment.03-511/711 Computational Genomics and Molecular Biology, Fall 2005 9(d) T is a cDNA sequence and U is a genomic sequence, both taken from a higher eukaryote.We wish to find the exon boundaries in the ge nomic DNA.i. Algorithm:ii. Initialization:t1t2t3. . .u1u2u3. . .iii. The recurrence relation; constraints on S[ ], if any.iv. The Score of the optimal alignment.03-511/711 Computational Genomics and Molecular Biology, Fall 2005 107. Given the distance function, M = 0, m = 1, g = 1, what is the relationship (<, ≤, =,etc.) between the quantities X and Y in each of the following cases? Give a one sentenceexplanation of your answer.(a) X = the sum-of-pairs (SP) score of a multiple sequence alignment.Y = the tree alignment score of the same multiple sequence alignment.(b) X = the SP score of a multiple sequence alignment obtained by dynamic programming.Y = the SP score of a multiple alignment of the same sequences obtained by progressivealignment.(c) X = the SP score of the optimal multiple alignment of sequences S1. . . Sk.Y =Pki=1Pkj>iD(Si, Sj), where D(Si, Sj) is the optimal pairwise score between seq-uences
View Full Document