DOC PREVIEW
Stanford CS 262 - Lecture 14 - Multiple Sequence Alignments

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Multiple Sequence Alignments CS262 Lecture 9 Win07 Batzoglou Evolution at the DNA level Deletion Mutation ACGGTGCAGTTACCA AC CAGTCCACCA REARRANGEMENTS Inversion Translocation Duplication CS262 Lecture 9 Win07 Batzoglou SEQUENCE EDITS Orthology Paralogy Inparalogs Outparalogs CS262 Lecture 9 Win07 Batzoglou Definition Given N sequences x1 x2 xN Insert gaps in each sequence xi such that All sequences have the same length L Score of the global map is maximum A faint similarity between two sequences becomes significant if present in many Multiple alignments reveal elements that are conserved among a class of organisms and therefore important in their common biology The patterns of conservation can help us tell function of the element CS262 Lecture 9 Win07 Batzoglou Scoring Function Sum Of Pairs Definition Induced pairwise alignment A pairwise alignment induced by the multiple alignment Example x y z AC GCGG C AC GC GAG GCCGC GAG Induces x ACGCGG C y ACGC GAC CS262 Lecture 9 Win07 Batzoglou x AC GCGG C z GCCGC GAG y AC GCGAG z GCCGCGAG Sum Of Pairs cont d Heuristic way to incorporate evolution tree Human Mouse Duck Chicken Weighted SOP S m k l wkl s mk ml CS262 Lecture 9 Win07 Batzoglou A Profile Representation T C C C A C G T A A A A A G G G G G G C C C C C T T T T T 1 A A A A A T C C T T C C C C C 1 6 1 4 1 C T G G G G G G G 4 2 4 8 4 1 6 2 2 1 C C C 8 1 2 2 2 A A A A G 6 8 Given a multiple alignment M m1 mn Replace each column mi with profile entry pi Frequency of each letter in gaps Optional gap openings extensions closings Can think of this as a likelihood of each letter in each position CS262 Lecture 9 Win07 Batzoglou Multiple Sequence Alignments Algorithms CS262 Lecture 9 Win07 Batzoglou Multidimensional DP Generalization of Needleman Wunsh S m S m i i sum of column scores F i1 i2 iN Optimal alignment up to i1 iN F i1 i2 iN max all neighbors of cube F nbr S nbr CS262 Lecture 9 Win07 Batzoglou Multidimensional DP Example in 3D three sequences 7 neighbors cell F i j k xk CS262 Lecture 9 Win07 Batzoglou max F i 1 j 1 k 1 S xi xj F i 1 j 1 k S xi xj F i 1 j k 1 S xi xk F i 1 j k S xi F i j 1 k 1 S xj xk F i j 1 k S xj F i j k 1 S xk Multidimensional DP Running Time 1 Size of matrix LN Where L length of each sequence N number of sequences 2 Neighbors cell 2N 1 Therefore O 2N LN CS262 Lecture 9 Win07 Batzoglou Multidimensional DP How do gap states generalize Running Time badly 1 Size ofVERY matrix LN Where Require 2N 1 states one per combination of gapped ungapped sequences L lengthtime of each Running O 2N sequence 2N LN O 4N LN N number of sequences Y 2 Neighbors cell 2N 1 XY YZ XYZ Z Therefore O 2N LN X CS262 Lecture 9 Win07 Batzoglou XZ Progressive Alignment pxy pxyzw pzw x y z w When evolutionary tree is known Align closest first in the order of the tree In each step align two sequences x y or profiles px py to generate a new alignment with associated profile presult Weighted version Tree edges have weights proportional to the divergence in that edge New profile is a weighted average of two old profiles CS262 Lecture 9 Win07 Batzoglou Progressive Alignment x Example y Profile z A C G T px 0 8 0 2 0 0 0 py 0 6 w0 0 0 0 4 When evolutionary tree is known s px py 0 8 0 6 s A A 0 2 0 6 s C A 0 8 0 4 s A 0 2 0 4 s C Align closest first in the order of the tree In each step align two sequencesResult x y or profiles px py0 1 to generate a new pxy 0 7 0 0 0 2 alignment with associated profile presult s px 0 8 1 0 s A 0 2 1 0 s C Weighted version Tree edges have weights proportional to the divergence in that edge Result p 0 4 0 1 0 0 0 5 New profile is a weighted average of two old x profiles CS262 Lecture 9 Win07 Batzoglou Progressive Alignment x y z w When evolutionary tree is unknown Perform all pairwise alignments Define distance matrix D where D x y is a measure of evolutionary distance based on pairwise alignment Construct a tree UPGMA Neighbor Joining Other methods Align on the tree CS262 Lecture 9 Win07 Batzoglou Heuristics to improve alignments Iterative refinement schemes A based search Consistency Simulated Annealing CS262 Lecture 9 Win07 Batzoglou Iterative Refinement One problem of progressive alignment Initial alignments are frozen even when new evidence comes Example x y GAAGTT GAC TT z w GAACTG GTACTG CS262 Lecture 9 Win07 Batzoglou Frozen Now clear correct y GA CTT Iterative Refinement Algorithm Barton Stenberg 1 For j 1 to N Remove xj and realign to x1 xj 1xj 1 xN 2 Repeat 4 until convergence allow y to vary x z fixed projection CS262 Lecture 9 Win07 Batzoglou z y x Iterative Refinement Example align x y z w xy zw x y z w GAAGTTA GAC TTA GAACTGA GTACTGA After realigning y x y z w CS262 Lecture 9 Win07 Batzoglou GAAGTTA G ACTTA GAACTGA GTACTGA 3 matches Iterative Refinement Example not handled well CS262 Lecture 9 Win07 Batzoglou x y1 y2 y3 GAAGTTA GAC TTA GAC TTA GAC TTA z w GAACTGA GTACTGA Realigning any single yi changes nothing Consistency z zk xi x y yj CS262 Lecture 9 Win07 Batzoglou yj Consistency z zk xi x y yj yj Basic method for applying consistency Compute all pairs of alignments xy xz yz When aligning x y during progressive alignment For each xi yj let s xi yj function of xi yj axz ayz Align x and y with DP using the modified s function CS262 Lecture 9 Win07 Batzoglou Real world protein aligners MUSCLE High throughput One of the best in accuracy ProbCons High accuracy Reasonable speed CS262 Lecture 9 Win07 Batzoglou MUSCLE at a glance 1 Fast measurement of all pairwise distances between sequences DDRAFT x y defined in terms of common k mers k 3 O N 2 L logL time 2 Build tree TDRAFT based on those distances with UPGMA 3 Progressive alignment over TDRAFT resulting in multiple alignment MDRAFT 4 Measure new Kimura based distances D x y based on MDRAFT 5 Build tree T based on D 6 Progressive alignment over T to build M 7 Iterative refinement for many rounds do Tree Partitioning Split M on one branch and realign the two resulting profiles If new alignment M has better sum …


View Full Document

Stanford CS 262 - Lecture 14 - Multiple Sequence Alignments

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Lecture 14 - Multiple Sequence Alignments
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 14 - Multiple Sequence Alignments and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 14 - Multiple Sequence Alignments 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?