the dash "−" like any other character,ATTACGTACTCCATGATTACGT−−−−CATGoperations for the gap of length 4.In maximal score alignments we treattimes.In terms of evolution this gap is deletion or insertion of length 4.ButIn an edit script we need edithence we charge the s(x,−) costs probably the result of a 44singleGapsone characterHowever, long gaps are less frequentthan short gapsBiological observations:Therefore ......gaps should be considered as singleunitsGap costs should depend on the length of the gap, they should bemonotonously growing, but not asfast as the legth itself.Gaps are usually longer then justg(n) gap cost of a gap of length nGap costs should be subadditive:n=n1+n2Subadditivity:g(n)<=g(n1)+g(n2)If not:Gap is cheaper if it is considered as two successive gaps.e.g. g(n)=9+3nScorematrix for pairs of characterse.g. VT160Gapcosts g(n)andScore= vt(M,M)−g(1)+vt(L,A)−g(2)+vt(V,V)MYL−−VM−ACVV= 6−2 −15 +4= −19−12SCORING GENERAL GLOBAL ALIGNMENT PROBLEMGiven a score matrix and a subadditive gap cost function,alignment.calculate the global maximal score(G)ijThere are three different ways thealignment of S1[1..i] and S2[1..j]can end.(F)(E)ijijfor maximal score alignmentsThe recurrence relationk<=j−1F(i,j)= max {V(k,j)−g(i−k)}k<=i−1with general gap cost function g(n)S(i,j)=max{E(i,j),F(i,j),G(i,j)}whereG(i,j)=S(i−1,j−1)+s(S1(i),S2(j))E(i,j)= max {V(i,k)−g(j−k)}Needleman Wunsch algorithm in amodification by Sankoff.InitialisationS(0,j)=−g(j)E(i,0)=−g(i)E(0,i) undefinedG(i,0) and G(0,j) undefinedG(0,0)=0F(0,j)=−g(j)F(j,0) undefinedS(i,0)=−g(i)INTNER0 1 2 3 4 5 6 701245673Time ComplexityW R I T E R SV*** * * * ********?*GFEThe number of colored spots dependsO(nm)O(nm )O(mn )2O(nm +mn )22 2cubicon the length of the sequences.******* * * * * * *****a gap is extendedg(n)=a+bnHigh costs for opening a gapbut lower costs for extending itThere are two different types ofalignments in the (E) caseATGCTATACGCAATTACGCAATTATGC−−−−−a new gap startsAffine gap costsE(i,j)=E(i,j−1)−bATGC−−−−ATGCTATACGCAATT−The optimal alignment up to positionsE)i and j−1 is of type (open extendextendE(i,j)=max{E(i,j−1),S(i,j−1)−a}−b i and j−1 is of type (G)The optimal alignment up to positionsE(i,j)=S(i,j−1)−a−bACGCAATT S(i,0)=E(i,0)=−a−b*iS(0,j)=F(0,j)=−a−b*jRecurrence relationS(i,j)=max{G(i,j),E(i,j),F(i,j)}E(i,j)=max{E(i,j−1),S(i,j−1)−a}−bG(i,j)=S(i−1,j−1)+s(S1(i),S2(j))F(i,j)=max{F(i−1,j),S(i−1,j)−a}−bInitialisationGotoh’s algorithm3F(i,j)=max{F(i−1,j),S(i−1,j)−a}−bE(i,j)= max{S(i,k)−g(j−k)}k<=j−1F(i,j)= max{S(k,j)−g(i−k)}k<=i−1O(n )2TIME COMPLEXITYGeneral gap costs:O(n )E(i,j)=max{E(i,j−1),S(i,j−1)−a}−bLOCAL CONSERVATIONCTACCTACCAAATTCAGCCCAGTCATTTTTTAGCTAAAGCGGGACATCAAAACCCCCGCTACTTCAGCACACTTGCTTCTACTACTACCACTCCTAATCGCADomainsLocal Alignment...ATTCCAGATG......AT−CTA−−TC...Idea: Just detect conserved regionsin a global alignment.The local alignment problemsegments a1 and a2 of S1 and S2, whosesimilarity (global alignment score)is maximal over all pairs of segmentsfrom S1 and S2. local alignment score of S1 and S2.We use H(S1,S2) to denote the optimalGiven two sequences S1 and S2, findthe best one.We take every pair of segments fromS1 and S2, calculate the correspondingoptimal global alignment and choose Lets say the length of the sequencesare n and m.There are O(n m )2 2pairs of segmentseach global segment alignment takesO(nm) time. Hence this algorithm isO(n m ) ... 3 3slowWe need to go for a better dynamicprogramming approach.Idea:Local AlignmentD(i,j)=optimal global alignmentThe optimal alignment is anextension of one of threeshorter global alignments.score of S1[1..i] and S2[1..j].H(i,j)=optimal local alignmentscore of S1[1..i] and S2[1..j] ?Global Alignment:S2[1..j]i−1ijj−1optimal localalignment ofS1[1..i−1] andS2[1..j−1]optimal localalignment ofS1[1..i] andTH(i,j)=maximal score of all local alignment, thatS1[1..i]S2[1..j]H(i,j)=optimal local alignment score ofS1[1..i] and S2[1..j] does not allow adynamic programming approach.BUTcontain S1(i) and S2(j) as there right most characters.Optimal suffix alignment score of S1[1..i] and S2[1..j]suffix of the prefixS1: TATGCTCAT GCTATS2: CTGCATATPrefixGCTATTCAT TATGCAATTATAC GTCAAWhat about: alignment is the empty alignment.If none of the alignments that end withscore, we say that the optimal localTHE EMPTY ALIGNMENTNo characters are aligned and the scoreis zero.matching S1(i) and S2(j) has a non negativeATCGCTTCCTTA−A−TTCCTATCGCTATCGC−TTCCTA−T...H(i,j−1)+g(1)Note that A is a type (i) extension ofT4 types of H(i,j)−alignmentsthe empty alignment......H(i−1,j)+g(1)Empty 0(i)......H(i−1,j−1)+s(T,A)(ii)(iii)(iv)...assume linear gap costs...maxfor local alignments +s(S1(i),−)+s(−,S2(i))+s(S1(i),S2(j))H(i−1,j−1)H(i,j−1)H(i−1,j)H(i,j) = Recurrence relationSMITH WATERMAN ALGORITHMscore of S1[1..i] and S2[1..j].0S(i,j) = optimal global alignmentINITIALIZATION45670 0 0 0 0 0 0 00000000W R I T E R SVINTNER0 1 2 3 4 5 6 70123E R SVINTNER0 1 2W4 5 6 7012345673R I T******* * * * ********?*The number of colored spots depends0Tabular calculationon the lengths of the sequences.******* * * * * * *ITNER0 1 2 3 4 5 6 701234567W R I T E R SVN* * ******** * * * ********max* * * * ******************* * ***** ** **0on the length of the sequences.The number of colored spots dependsTraceback******* * * *0 050 0 0 0 0 0 06 701200000034560 0 0 0 0 00010 0 0 0 0 0 0 002 3 422C−PCCAPCAC−PCDLACPACSEDCPCD12 4 12 41 4 13 10 4 13 50 13 5 10 14 50 5 13 5 14 1422CAPCSunken town algorithmmismatch = −infinitymatch =1gap = −infinityOptimal local alignment is thelongest common word of the twosequencesTCCATTany sequencea a a a a a a aS1:a1 a2 a3 a4 a5 a6 a7 a8 a9 a10...i i i i i i i i1 2 3 4 5 6 7 8...with i < i < i < i ...1 2 3 4is a subsequence of S1.Example:S1 AT GTT T C G AT CC A T TA Sequencegap = 0T CC A T TS2 CCG AA CA GGTCT AT TC CTCCATTis a common subsequenceof S1 and S2.Optimal local alignment is the longestcommon subsequencemismatch = −infinitymatch = +1S1 AT GTT T C G A as the sequences.q(i)q(j)s(ai,aj)i,jE [s]=qIf the local alignment should be asmall region in the alignment E
View Full Document