CS262 Lecture 3: Sequence Alignment continued, BLAST Professor Serafim Batzoglou TA: Nadine Hussami Scribe: Brian Lao Lecture Date: 1/14/2014 !!!!!!!1!TABLE OF CONTENTS SCORING THE GAPS MORE ACCURATELY..…………………… 2-8 Convex Gap Dynamic Programming …..……………………..... 2-4 Affine Gap Penalty ……………………………………….…....... 4-6 Piecewise Linear Function …………………………………….... 6-7 Bounded Dynamic Programming ……………………………..... 7-8 LINEAR-SPACE ALIGNMENT ………………………………...….. 8-13 Sequences and Subsequences …………………............………….. 8 Hirschberg’s Algorithm ……………………………….....……. 8-13 HEURISTIC LOCAL ALIGNERS ……………………………….... 13-15 BLAST ……………………………………………………….... 13-15 !!!!!!!!!!!!!!!!!!!!!!!!2!START!OF!LECTURE!!SCORING!THE!GAPS!MORE!ACCURATELY! ! ! ! ! ! !!Previous)Model:!!Last!time,!we!defined!gap!penalties!as!a!given!constant!in!any!of!the!2!sequences.!Every!letter!that!goes!to!a!gap!incu rs!a!pen alty,!lets!call!it!d.!!!For!a!gap!of!length!n,!the!penalty!is!n!x!d.!!Figure!1.!!A!graph!illustrating!the!length!of!the!gap!vs.!the!magnitude!of!the!penalty.!As!the!gap!length!increases,!the!penalty!increases!linearly.!How)gaps)biologically)work)in)reality:!!However,!in!biological!sequences,!gaps!of!different!lengths!are!not!linearly!worse!because!gaps!usually!occur!in!bunches.!Longer!gaps!are!not!linearly!worse!than!shorter!gaps.!!Given!that!we’ve!opened!a!gap,!many!biological!mechanisms!will!make!it!easier!to!extend!the!gap.!!For!example,!the!presence!of!certain!repeats,!transp oson s!tha t!have!large!length.!!!How)do)we)handle)this?)Concave)gap)penalty)function)γ(n):!!As!gap!increases,!second!position!less!expensive,!third!position!even!less!expensive!etc.!!!γ(n):!for!all!n,!γ(n!+!1)!J!γ(n)!≤!γ(n)!J!γ(n!–!1)!!3!)Figure!2.!!A!concave!gap!penalty!function!graph!illustrating!the!length!of!the!gap!vs.!the!magnitude!of!the!penalty.!!As!the!gap!length!gets!longer,!the!marginal!penalty!decreases!for!each!successive!extension!of!the!gap.!Why)Original)Dynamic)Programming)does)not)work)for)nonDconstant)gap)scoring:!!Here,!the!original!dynamic!programming!for!global!alignment!does!not!work!if!the!gap!scoring!is!not!constant,!and!is!instead!decreasing!with!every!additional!position.!!!Why?!!Let’s!imagine!we’re!in!a!middle!cell!of!a!given!matrix,!at!position!(i,!j).!!In!the!original!algorithm,!we!would!look!at!th e!ma x!o f!the!three!positions!corresponding!to:!!Up:!F(i,!jJ1)!J!d!Left:!F(iJ1,!j)!J!d!Diagonal!upJleft:!F(iJ1,!jJ1)!+!s(xi,!yj)!However,!now!we!want!d!(the!gap!penalty)!to!be!dependent!on!how!long!the!gap!is.!!The!original!algorithm!does!no t!wo rk!here!b ecause!w e!ca nno t!simply!look!back!at!how!many!positions!we’ve!got!so!far!and!penalize!additional!increments!of!the!gap.!!When!we!“made!the!decision”!to!open!the!gap,!we!did!not!know!that!this!would!make!subsequent!gaps!up!to!the!current!position!cheaper,!and!we!don’t!know!that!the!optimum!did!not!come!from!somewhere!else.!!Thus!we!prefer!not!to!open!the!gap,!and!instead!just!take!the!diagonal!upJleft!direction.!!4!Convex)gap)dynamic)programming:!!Below!is!the!way!that!does!work.1!!!Initialization!and!termin atio n !a re!th e!s am e !as !b efore.!!However!the!iteration!is:!!We!maximize!over!all!possible!gap!lengths!for!gaps!that!end!in!this!position.!!For!example,!if!we!are!at!position!I,j,!then!we!say,!the!max!could!either!come!from!(iJ1,!jJ1),!and!I’m!assuming!the!other!possibilities!are!a!gap!of!a!given!length!ending!here.!!!How!long!could!this!gap!be?!Could!be!length!1!or!2,!etc.,!could!come!from!any!position!k,!from!0!to!iJ1,!and!could!be!from!beginning!of!sequence!to!iJ1,!and!there!would!be!a!gap!in!that!many!positions,!iJk!positions.!!This!would!cost!exactly!gamma!of!iJk.!!I!assume!it!closed!at!this!current!position!and!pay!for!it!Running!time!for!this!algorithm!is!O(N2M)!under!the!assumption!of!N>M,!and!the!Space!is!O(NM).!Compromise:)affine)gap)penalty2:!!We!penalize!a!gap !o f!l ength!n!by!a!gap!open!penalty!of!d!*!number!of!positions!in!the!gap,!and!can!see!why!this!makes!an!approximation!of!a!convex!scoring!function.!!Idea!is!that!its!bad!to!open!a!gap,!but!once!its!open,!can!extend!it!relatively!more!cheaply.!!The!gap!penalty!function!is:!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!1!According!to!lecture,!no!one!actually!uses!this!algorithm,!so!it!is!deemed!to!be!not!very!2!The!affine!gap!penalty!algorithm!is!more!extensively!used!and!is!the!most!common.!!5!Ex!.!an!upper!sequence!with!3!gaps,!a!lower!sequence!with!2!gaps,!where!there’s!10!matches:!the!score!of!the!alignment!is!10m!–!2d!–!3e.!!For!this,!d!and!e!are!constants.!!Figure!3.!!Graph!illustrating!the!length!of!the!gap!vs.!the!magnitude!of!the!gap!penalty.!!There!is!an!initial!gap!opening!penalty!of!d,!and!a!gap!penalty!of!(nJ1)*e!for!extensions!of!the!gap.!Will)have)3)Dynamic)Programming)Matrices)for)Affine)Gap)model:!!One!matrix,!matrix!F,!will!keep!the!scores!of!aligning!first!i!positions!of!sequence!x!to!first!j!positions!of!sequence!y,!under!assumption!th at!xi!aligns!to!yj.!!The!second!matrix,!Matrix!G,!will!be!the!optimal!score!of!alignment!assuming!xi!aligns!to!a!gap!after!yj.!!And!then!there!is!Matrix!H,!which!gives!optimal!score!again!assuming!yj!aligns!to!a!gap!after!xi.!!Matrix!V!will!be!the!max!of!the!above!three!matrices.!!! NeedlemanDWunsch)with)Affine)Gaps:))Why)do)we)need)the)three)matrices?:!!The!reason!for!three!matrices!is!b e ca u se !t h ere !are !sc e n ari o s!where!a!gap!at!i,!j!is!not!optimal!having!opened!the!gap!and!ending!it!there.!!But!if!we!go!one!further!position,!maybe!that!opening!of!the!gap!was!a!good!!6!investment!for!the!overall!score,!and!now!extendin g!the!gap !gives!the!o ptim al!alignment,!but!we!wouldn’t!know!that!unless!we!made!the!initially!subJoptimal!investment!of!opening!the!gap.!!You!don’t!know!at!the!stage!of!the!initial!open!gap,!but!when!extend!then!you !can!see!th at!it’s!the!o ptim al,!so!w e!need!separate!matrices!to!keep!track!of!just!the!scores!with!gaps.!! NeedlemanDWunsch)with)Affine)Gaps:))The)Algorithm:!!Intialization!is!similar!to!before.!!But!iteration!now!involves!keeping!track!of!three!DP!matrices,!and!then!choosing!the!max!out!of!the!three:!!!!
View Full Document