DOC PREVIEW
Stanford CS 262 - Multiple Sequence Alignments

This preview shows page 1-2-3-20-21-22-41-42-43 out of 43 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Multiple Sequence AlignmentsMLAGAN: progressive alignment of DNAMLAGAN: main stepsMLAGAN: multi-anchoringWhole-genome alignment Human/Mouse/RatInsertion/Deletion Rate AnalysisHeuristics to improve multiple alignmentsIterative RefinementSlide 9Slide 10Slide 11Slide 12Restricted MDPTree RefinementA* for Multiple AlignmentsSlide 16Consistency – T-CoffeeSlide 18T – Coffee LayoutGenerating Primary LibraryPrimary LibraryCombining the librariesLibrary ExtensionRunning TimeT-Coffee compared with other methodsGene RecognitionReadingGene expressionGene structureSlide 30Slide 31Slide 32Approaches to gene findingHMMs for single species gene finding: Generalized HMMsSlide 35Slide 36Slide 37Better way to do it: negative binomialBiology of SplicingConsensus splice sitesSlide 41Splice Site ModelsSlide 43Multiple Sequence AlignmentsAlgorithmsMLAGAN: progressive alignment of DNAGiven N sequences, phylogenetic treeAlign pairwise, in order of the tree (LAGAN)HumanBaboonMouseRatMLAGAN: main stepsGiven a collection of sequences, and a phylogenetic tree1. Find local alignments for every pair of sequences x, y2. Find anchors between every pair of sequences, similar to LAGAN anchoring3. Progressive alignment•Multi-Anchoring based on reconciling the pairwise anchors•LAGAN-style limited-area DP4. Optional refinement stepsMLAGAN: multi-anchoringXZYZX/YZTo anchor the (X/Y), and (Z) alignments:Whole-genome alignment Human/Mouse/RatInsertion/Deletion Rate AnalysisHeuristics to improve multiple alignments•Iterative refinement schemes•A*-based search•Consistency•Simulated Annealing•…Iterative RefinementOne problem of progressive alignment:•Initial alignments are “frozen” even when new evidence comesExample:x: GAAGTTy: GAC-TTz: GAACTGw: GTACTGFrozen!Now clear correct y = GA-CTTIterative RefinementAlgorithm (Barton-Stenberg):1. Align most similar xi, xj2. Align xk most similar to (xixj)3. Repeat 2 until (x1…xN) are aligned4. For j = 1 to N,Remove xj, and realign to x1…xj-1xj+1…xN5. Repeat 4 until convergenceNote: Guaranteed to convergeIterative RefinementFor each sequence y1. Remove y2. Realign y(while rest fixed)xyzx,z fixed projectionallow y to varyIterative RefinementExample: align (x,y), (z,w), (xy, zw):x: GAAGTTAy: GAC-TTAz: GAACTGAw: GTACTGAAfter realigning y:x: GAAGTTAy: G-ACTTA + 3 matchesz: GAACTGAw: GTACTGAIterative RefinementExample not handled well:x: GAAGTTAy1: GAC-TTAy2: GAC-TTAy3: GAC-TTAz: GAACTGAw: GTACTGARealigning any single yi changes nothingRestricted MDPRun MDP, restricted to radius R from mxyzRunning Time: O(2N RN-1 L)Tree RefinementRun 3D-DP, restricted to radius R from m, for each tree nodexyzRunning Time: ~7R2 LNR: RadiusL: Alignment LengthN: Number of SequencesA* for Multiple AlignmentsReview of the A* algorithmvSTARTGOALg(v)h(v)•g(v) is the cost so far•h(v) is an estimate of the minimum cost from v to GOAL•f(v) ≥ g(v) + h(v) is the minimum cost of a path passing by v1. Expand v with the smallest f(v)2. Never expand v, if f(v) ≥ shortest path to the goal found so farA* for Multiple Alignments•Nodes: Cells in the DP matrix•g(v): alignment cost so far•h(v): sum-of-pairs of individual pairwise alignments•Initial minimum alignment cost estimate: sum-of-pairs of global pairwise alignmentsvSTARTGOALg(v)h(v)Consistency – T-Coffeezxyxiyjyj’zkT – Coffee LayoutLALIGN CLUSTALWGenerating Primary LibraryA A A B B B A AC CB B BC C CClustalW Primary Library Lalign Primary Library (10 top scoring non–intersecting Local (Global pairwise alignment) (Pairwise alignment)Library has information for each N(N-1)/2 sequence pairs.Primary LibrarySeq A GARFIELD THE LAST FAT CAT Seq B GARFIELD THE FAST CAT - - - Prim. Weight = 88Seq A GARFIELD THE LAST FA-T CAT Seq C GARFIELD THE VERY FAST --- Prim. Weight = 77Seq A GARFIELD THE LAST FAT CAT Seq D - - - -- - -- THE - - - - FAT CAT Prim. Weight = 100Seq B GARFIELD THE - - - - FAST CAT Seq C GARFIELD THE VERY FAST CAT Prim. Weight = 100Seq B GARFIELD THE FAST CAT Seq D - - - - -- - THE FA-T CAT Prim. Weight = 100Seq C GARFIELD THE VERY FAST CAT Seq D - - - -- -- - THE - - - - FA-T CAT Prim. Weight = 100Combining the librariesSeq A GARFIELD THE LAST FAT CATSeq B GARFIELD THE FAST CAT - - -Primary weight(ClustalW)=88Primary Weight(Lalign)=88W(A(G),B(G)) = 88 + 88 = 176If a pair is duplicated across the two libraries, it is merged into single entry with weight = sum of two weightspairs of residue that did not occur are not present ( weight 0 )Library Extension• Complete extension requires examination of all triplets.• Not all bring information ( eg. A and B through D ).• Weight of a pair =  weights gathered through examination of all triplets involving that pair.Running Time•Complexity of entire procedure:O(N2 * L2) + O(N3*L) + O(N3) + O(N*L2)O(N2 L2) - pair-wise library computationO(N3 L) - library extensionO(N3) - computation of NJ treeO(N L2) - progressive alignment computationWhere:L – average sequence lengthN – number of sequencesT-Coffee compared with other methodsMethod Cat1(81) Cat2(23) Cat3(4) Cat4(12) Cat5(11) Total(141)Dialign 71.0 25.2 35.1 74.7 80.4 61.5ClustalW 78.5 32.2 42.5 65.7 74.3 66.4Prrp 78.6 32.5 50.2 51.1 82.7 66.4T-Coffee 80.7 37.3 52.9 83.2 88.7 72.1Gene RecognitionCredits for slides:Marina AlexanderssonLior PachterSerge SaxonovReading•GENSCAN•EasyGene•SLAM•TwinscanOptional:Chris Burge’s ThesisGene expressionProteinRNADNAtranscriptiontranslationCCTGAGCCAACTATTGATGAAPEPTIDECCUGAGCCAACUAUUGAUGAAGene structureexon1exon2 exon3intron1 intron2transcriptiontranslationsplicingexon = codingintron = non-codingFinding genesStart codonATG5’3’Exon 1Exon 2Exon 3Intron 1 Intron 2Stop codonTAG/TGA/TAASplice sitesApproaches to gene finding•HomologyBLAST, Procrustes.•Ab initioGenscan, Genie, GeneID.•HybridsGenomeScan, GenieEST, Twinscan, SGP, ROSETTA, CEM, TBLASTX, SLAM.HMMs for single species gene finding: Generalized HMMsHMMs for gene findingGTCAGAGTAGCAAAGTAGACACTCCAGTAACGCexonexonexonintronintronintergeneintergeneGHMM for gene findingTAA A A A A A A A A A A AA AAT T T T T T T T T T T T T T TG GGG G G G GGGG G G G GC C C C C C CExon1 Exon2 Exon3durationObserved duration timesBetter way to do it: negative binomial•EasyGene:Prokaryoticgene-finderLarsen TS, Krogh A•Negative binomial with n = 3Biology of


View Full Document

Stanford CS 262 - Multiple Sequence Alignments

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Multiple Sequence Alignments
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multiple Sequence Alignments and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiple Sequence Alignments 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?