UMD CMSC 838T - Sequence Analysis Primer - D2040364

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 838T> Sequence Analysis Primer

DOC PREVIEW

UMD CMSC 838T - Sequence Analysis Primer

School name University of Maryland, College Park

Course Cmsc 838t- Advanced Topics in Programming Languages

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

WBC Biotechnical Resource Series Richard R. Burgess, Series Editor University of W~sconsin Biotechnology Center Madison, Wisconsin Sequence Analysis Primer Edited by Michael Gribskov and John Devereux G. Grant, ed. Synthetic Peptides: A User's h4. Gribskov and J. DeveIcux, cds. Scquence Analysis Primer Guide W.H. Freeman and Company New York124 StMlLAKlTY AN13 HOMOLOGY the smc typ used to computc a dol iahx) and notc U~C avcratc displilcc- mcnui of the highcst-scoring segmcnls. ]his is convcnicnlly donc using NBRF's RELATE program. Thc avcmgc displrtccrncnls of high-scoring segmenu tend to be multiples of the repeat unit lcngth. For example, RELATE analysis of apMV which con~ns a fundamend 11-rcsiduc repeat, showed additional displacements of 22,44, 55, 66, etc, residues (Boguski et al., 1986b). Likcwise, analysis of cdc23 which condns a 34- rcsiduc rcpt, rcsulrcd in additional displacemcnls of 68 and 102 amino acids (Sikorski et al., 1990). Finally, bolh autocorrclationZ6 (Kubob ct ill.. 1981) and Fouricr mcthods havc bccn used to dctcrminc thc pcriotl of scqucncc rcpc;lLF. Sulnlnnry :lnd Fulurc Dcvclolmcnls Dot rnawix analysis is a sirnplc, yct powcrful, tcchniquc for scqucncc colnpuison. To p;rmpllrasc Collins and Coulson (1987), any colnpiuisoll of two scqucnccs (or of a scqucncc with itscll)should sku1 with a dot plot. WC haw SCCII multiplc insI:wcs ill wllicll hilurc to llcctl Ihisntlvicchia tlCliIyCtl thc idcntilication of imlnrunt qucncc f&?turcs. As uscfd ns (lot nl:ltrix mdlotls arc, thcrc is still considcmblc room for inlprowncnt. No prcwnt ilnplcmnl;Ition kkcs full etlv;inkigc of mtwlcnl computcr hardwarc and grqhical uscr interlace tcchnology. Dot malrix analysis would dsobcncfit from intcption wilhothcrtypcsofdatnanalysis and irnagc display tools. Thc incorporation of multi-lcnglh probcs (Argos. 1957: Argos;ladVingro~i, 11)~0)andcusto~nizulscoring~nalriccs(AI~~l1ul. 1~91)\vo11ltli~n~~rc~vcsc~~si~ivi~yands~~ifici~y. Fin;~lly,~ld~oughdotn;~trix :alalysis will lilll~l;llllcllt:1lIy rclr;lia 8 hcurislic mcdlod of CxploEltory tl;lti\ analysis. lllcabilily tocstimntclllcslrrtistical significanccof thcpattcrns onc ohancs is highly dcsimblc and nligbt bcacconlplishcd using acornbination of ncw and klditional mcdiotls. DI'YAklIC PROGRAMMING METHODS Dot mmix nlclhcds rcly on thc powcr of thc human brain to rwognizc ptlucrns indic:ltivc of sin1il;lrity .and to add gaps to lllc scqucnccs to achicvc an alignmcnt. It is, howcvcr, quitcdiflicult to kc sure that onchas obMincd thc highcst scoring, or optimal, alignment when it is madc by hand. Bccausc uxssmcnts of homology arc almost always madcon thc basis of alignrncnts prcduccd by dgnalnic programming approachcs, wc will discuss the method in dcuil. 3 Autororrel&lion nnalyis is wailhlc in Amor Dawoch'r PC/OBNE,m!rkcld by IntslliGutcricc. Inc. (Urunuin \'icw.CA) r e DYNAMIC PROGRAMMING METHODS 125 A brutc forccapproach of aligning scqucnccs with thcautomatic inscnion of gaps shows hat thc problcm is vcry difficult. Simply comparing two scqucnces, wilhoul gaps, is quivalcnt LO thecompulation that rakcs place in dot matrix analysis, d rquircs time propohonal u, hc p~cducl of the lcngths of the scqucnh (is., timc proportional to NM, where N=length of scqucnc.e 1, and M=ldngth of sequence 2). If Ihc squenccs are assumed lo bc approximatcly the samc lcngth 0, ihcn tirnc proponional LO 1\p is rc- quircd. To account for thc prcscncc of gaps, we would havc to rcpt this calculation 2N timcs to cxaminc thc possibility of gaps in cxh position of cach scqucncc, for Limc proprlional LO N'N. In actuality. Ihc situation is not quitc so bad sincc somc of hcsc alignrncnts would bc nonscnsical, for inskmccaligning gaps will) gaps. An cxplicitcquation has bccn dcrivcd for thc numbcr of comparisons that would bc requircd (Watcrman, 1989). For twoscqucnccs300rcsiducs long about 1O"cornparisons would bcrcquird, which cornparcs favorably to thc estimated 1CPo clcmcnmy pruticlcs in chc univcrsc. Fonunady, thcrc is a rnorc cfficicnc way of aligning scqucnccs based on an approach known as dynamic programming. Nccdlcman and Wunsch (1070)inlrodu~thisapproachtomol~ularbiologis~,whichis,~0thisday, frcqucnlly rcfcrrd to as thc Nccdlcman-Wunsch algorithm. Thc dynamic programming mclhotl rcquircs only timc proportional u, W, and is bad on a sirnplc mlimtion of what thc lcrm optimal ulignmcnt irnplics. Derivation of Dynamic Programming Alignmc~ll lfwcconsidcrthcopdmal,orhighcstscoring,alignmcntshowninA(bclow), wc m brcnk thc aligrimcnl into two parts ils shown in B. Thc ovcrall dignmcnt scorc is thc scorc for the Icft-hand alignrncnt of four bascs plus thc scorc for aligning thc two boscs on thc right. If wc usurnc that the 5 basc alignrncnt in A is optid, wc must concludc Ulat the row basc alignrncnt in Bisalsoanoptimalalignmcnl. Ifitwasnot(forcxamplc,ifwcgavcapositivc scorc for aligning G with T), thc alignrncnt shown in C would givc a highcr scorc than lhconc shown in A. Then C rather than A would bc thc optimal alignment. AATGC AATG c AATGC I II I1+1 I .II AG-GC AG-G C A-GGC A U C InplainEnglishLhcn.thcbcstalignrncntthatcndsatagivcnpairofbascs or rcsiducs is thc bcst alignrncnt of thc scqucnces up LO (hat pint, plus thc scorc for aligning the two additional baxs or rcsiducs. Mathcmadcally, wc126 SLMlLARlTY AND HOMOLOCY DYNAMIC PROGRAMMING METHODS 127 Simple Examplc ojDynamic Programming Alignment Actual alignments arc calculatcd in two stages. First, the two scqycnccs are arr~gcdonalatticcinrnuchthcsamewoyasindotmatrixmethods. Forcach point in thc btticc, thc illignmcnl scorn, S,, is calculalcd. At thc yc timc, UIcpositionoflhebestalignmcntinIhcprcviousroworcolumn,i.c.,Ihcscorc ofthcbcstprcviousalignmentwhich was uscd tocdculaleSv, issorcd. This storcd value is callcd a poinler and is rnprcsentcd by an arrow. In thcsccond stogc, thcalignment is produced by swing at UIC highest alignment =on in the lattice, and building up thc alignrncnl from right to left by following the pointers. This sccond slngc is called UIC traceback. A gmph ofthc pointcrs is somctimes rcfcrrcd IO as a pafh gnph bccausc it delincs thc path hrough UIC Iatticc that cornponds to UIC optimal alignmcnt. Figure 20 shows P simplc cxamplc of a dynamic progmming align-

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

UMD CMSC 838T - Sequence Analysis Primer

Sign up for free to view:

Please select your school