Duke STA 294 - Gaps - D609150

Home> Schools> Duke University> Statistical Science (STA) > STA 294> Gaps

Duke STA 294 - Gaps

Pages 36

Download Save

Unformatted text preview:

the dash "−" like any other character,ATTACGTACTCCATGATTACGT−−−−CATGoperations for the gap of length 4.In maximal score alignments we treattimes.In terms of evolution this gap is deletion or insertion of length 4.ButIn an edit script we need edithence we charge the s(x,−) costs probably the result of a 44singleGapsone characterHowever, long gaps are less frequentthan short gapsBiological observations:Therefore ......gaps should be considered as singleunitsGap costs should depend on the length of the gap, they should bemonotonously growing, but not asfast as the legth itself.Gaps are usually longer then justg(n) gap cost of a gap of length nGap costs should be subadditive:n=n1+n2Subadditivity:g(n)<=g(n1)+g(n2)If not:Gap is cheaper if it is considered as two successive gaps.e.g. g(n)=9+3nScorematrix for pairs of characterse.g. VT160Gapcosts g(n)andScore= vt(M,M)−g(1)+vt(L,A)−g(2)+vt(V,V)MYL−−VM−ACVV= 6−2 −15 +4= −19−12SCORING GENERAL GLOBAL ALIGNMENT PROBLEMGiven a score matrix and a subadditive gap cost function,alignment.calculate the global maximal score(G)ijThere are three different ways thealignment of S1[1..i] and S2[1..j]can end.(F)(E)ijijfor maximal score alignmentsThe recurrence relationk<=j−1F(i,j)= max {V(k,j)−g(i−k)}k<=i−1with general gap cost function g(n)S(i,j)=max{E(i,j),F(i,j),G(i,j)}whereG(i,j)=S(i−1,j−1)+s(S1(i),S2(j))E(i,j)= max {V(i,k)−g(j−k)}Needleman Wunsch algorithm in amodification by Sankoff.InitialisationS(0,j)=−g(j)E(i,0)=−g(i)E(0,i) undefinedG(i,0) and G(0,j) undefinedG(0,0)=0F(0,j)=−g(j)F(j,0) undefinedS(i,0)=−g(i)INTNER0 1 2 3 4 5 6 701245673Time ComplexityW R I T E R SV*** * * * ********?*GFEThe number of colored spots dependsO(nm)O(nm )O(mn )2O(nm +mn )22 2cubicon the length of the sequences.******* * * * * * *****a gap is extendedg(n)=a+bnHigh costs for opening a gapbut lower costs for extending itThere are two different types ofalignments in the (E) caseATGCTATACGCAATTACGCAATTATGC−−−−−a new gap startsAffine gap costsE(i,j)=E(i,j−1)−bATGC−−−−ATGCTATACGCAATT−The optimal alignment up to positionsE)i and j−1 is of type (open extendextendE(i,j)=max{E(i,j−1),S(i,j−1)−a}−b i and j−1 is of type (G)The optimal alignment up to positionsE(i,j)=S(i,j−1)−a−bACGCAATT S(i,0)=E(i,0)=−a−b*iS(0,j)=F(0,j)=−a−b*jRecurrence relationS(i,j)=max{G(i,j),E(i,j),F(i,j)}E(i,j)=max{E(i,j−1),S(i,j−1)−a}−bG(i,j)=S(i−1,j−1)+s(S1(i),S2(j))F(i,j)=max{F(i−1,j),S(i−1,j)−a}−bInitialisationGotoh’s algorithm3F(i,j)=max{F(i−1,j),S(i−1,j)−a}−bE(i,j)= max{S(i,k)−g(j−k)}k<=j−1F(i,j)= max{S(k,j)−g(i−k)}k<=i−1O(n )2TIME COMPLEXITYGeneral gap costs:O(n )E(i,j)=max{E(i,j−1),S(i,j−1)−a}−bLOCAL CONSERVATIONCTACCTACCAAATTCAGCCCAGTCATTTTTTAGCTAAAGCGGGACATCAAAACCCCCGCTACTTCAGCACACTTGCTTCTACTACTACCACTCCTAATCGCADomainsLocal Alignment...ATTCCAGATG......AT−CTA−−TC...Idea: Just detect conserved regionsin a global alignment.The local alignment problemsegments a1 and a2 of S1 and S2, whosesimilarity (global alignment score)is maximal over all pairs of segmentsfrom S1 and S2. local alignment score of S1 and S2.We use H(S1,S2) to denote the optimalGiven two sequences S1 and S2, findthe best one.We take every pair of segments fromS1 and S2, calculate the correspondingoptimal global alignment and choose Lets say the length of the sequencesare n and m.There are O(n m )2 2pairs of segmentseach global segment alignment takesO(nm) time. Hence this algorithm isO(n m ) ... 3 3slowWe need to go for a better dynamicprogramming approach.Idea:Local AlignmentD(i,j)=optimal global alignmentThe optimal alignment is anextension of one of threeshorter global alignments.score of S1[1..i] and S2[1..j].H(i,j)=optimal local alignmentscore of S1[1..i] and S2[1..j] ?Global Alignment:S2[1..j]i−1ijj−1optimal localalignment ofS1[1..i−1] andS2[1..j−1]optimal localalignment ofS1[1..i] andTH(i,j)=maximal score of all local alignment, thatS1[1..i]S2[1..j]H(i,j)=optimal local alignment score ofS1[1..i] and S2[1..j] does not allow adynamic programming approach.BUTcontain S1(i) and S2(j) as there right most characters.Optimal suffix alignment score of S1[1..i] and S2[1..j]suffix of the prefixS1: TATGCTCAT GCTATS2: CTGCATATPrefixGCTATTCAT TATGCAATTATAC GTCAAWhat about: alignment is the empty alignment.If none of the alignments that end withscore, we say that the optimal localTHE EMPTY ALIGNMENTNo characters are aligned and the scoreis zero.matching S1(i) and S2(j) has a non negativeATCGCTTCCTTA−A−TTCCTATCGCTATCGC−TTCCTA−T...H(i,j−1)+g(1)Note that A is a type (i) extension ofT4 types of H(i,j)−alignmentsthe empty alignment......H(i−1,j)+g(1)Empty 0(i)......H(i−1,j−1)+s(T,A)(ii)(iii)(iv)...assume linear gap costs...maxfor local alignments +s(S1(i),−)+s(−,S2(i))+s(S1(i),S2(j))H(i−1,j−1)H(i,j−1)H(i−1,j)H(i,j) = Recurrence relationSMITH WATERMAN ALGORITHMscore of S1[1..i] and S2[1..j].0S(i,j) = optimal global alignmentINITIALIZATION45670 0 0 0 0 0 0 00000000W R I T E R SVINTNER0 1 2 3 4 5 6 70123E R SVINTNER0 1 2W4 5 6 7012345673R I T******* * * * ********?*The number of colored spots depends0Tabular calculationon the lengths of the sequences.******* * * * * * *ITNER0 1 2 3 4 5 6 701234567W R I T E R SVN* * ******** * * * ********max* * * * ******************* * ***** ** **0on the length of the sequences.The number of colored spots dependsTraceback******* * * *0 050 0 0 0 0 0 06 701200000034560 0 0 0 0 00010 0 0 0 0 0 0 002 3 422C−PCCAPCAC−PCDLACPACSEDCPCD12 4 12 41 4 13 10 4 13 50 13 5 10 14 50 5 13 5 14 1422CAPCSunken town algorithmmismatch = −infinitymatch =1gap = −infinityOptimal local alignment is thelongest common word of the twosequencesTCCATTany sequencea a a a a a a aS1:a1 a2 a3 a4 a5 a6 a7 a8 a9 a10...i i i i i i i i1 2 3 4 5 6 7 8...with i < i < i < i ...1 2 3 4is a subsequence of S1.Example:S1 AT GTT T C G AT CC A T TA Sequencegap = 0T CC A T TS2 CCG AA CA GGTCT AT TC CTCCATTis a common subsequenceof S1 and S2.Optimal local alignment is the longestcommon subsequencemismatch = −infinitymatch = +1S1 AT GTT T C G A as the sequences.q(i)q(j)s(ai,aj)i,jE [s]=qIf the local alignment should be asmall region in the alignment E

View Full Document


School:
Email:
New Password:
Confirm Password:

Duke STA 294 - Gaps

Sign up for free to view:

Please select your school