Multiple Sequence Alignments CS262 Lecture 9 Win07 Batzoglou Evolution at the DNA level Deletion Mutation ACGGTGCAGTTACCA AC CAGTCCACCA REARRANGEMENTS Inversion Translocation Duplication CS262 Lecture 9 Win07 Batzoglou SEQUENCE EDITS Orthology Paralogy Inparalogs Outparalogs CS262 Lecture 9 Win07 Batzoglou Definition Given N sequences x1 x2 xN Insert gaps in each sequence xi such that All sequences have the same length L Score of the global map is maximum A faint similarity between two sequences becomes significant if present in many Multiple alignments reveal elements that are conserved among a class of organisms and therefore important in their common biology The patterns of conservation can help us tell function of the element CS262 Lecture 9 Win07 Batzoglou Scoring Function Sum Of Pairs Definition Induced pairwise alignment A pairwise alignment induced by the multiple alignment Example x y z AC GCGG C AC GC GAG GCCGC GAG Induces x ACGCGG C y ACGC GAC CS262 Lecture 9 Win07 Batzoglou x AC GCGG C z GCCGC GAG y AC GCGAG z GCCGCGAG Sum Of Pairs cont d Heuristic way to incorporate evolution tree Human Mouse Duck Chicken Weighted SOP S m k l wkl s mk ml CS262 Lecture 9 Win07 Batzoglou A Profile Representation T C C C A C G T A A A A A G G G G G G C C C C C T T T T T 1 A A A A A T C C T T C C C C C 1 6 1 4 1 C T G G G G G G G 4 2 4 8 4 1 6 2 2 1 C C C 8 1 2 2 2 A A A A G 6 8 Given a multiple alignment M m1 mn Replace each column mi with profile entry pi Frequency of each letter in gaps Optional gap openings extensions closings Can think of this as a likelihood of each letter in each position CS262 Lecture 9 Win07 Batzoglou Multiple Sequence Alignments Algorithms CS262 Lecture 9 Win07 Batzoglou Multidimensional DP Generalization of Needleman Wunsh S m S m i i sum of column scores F i1 i2 iN Optimal alignment up to i1 iN F i1 i2 iN max all neighbors of cube F nbr S nbr CS262 Lecture 9 Win07 Batzoglou Multidimensional DP Example in 3D three sequences 7 neighbors cell F i j k xk CS262 Lecture 9 Win07 Batzoglou max F i 1 j 1 k 1 S xi xj F i 1 j 1 k S xi xj F i 1 j k 1 S xi xk F i 1 j k S xi F i j 1 k 1 S xj xk F i j 1 k S xj F i j k 1 S xk Multidimensional DP Running Time 1 Size of matrix LN Where L length of each sequence N number of sequences 2 Neighbors cell 2N 1 Therefore O 2N LN CS262 Lecture 9 Win07 Batzoglou Multidimensional DP How do gap states generalize Running Time badly 1 Size ofVERY matrix LN Where Require 2N 1 states one per combination of gapped ungapped sequences L lengthtime of each Running O 2N sequence 2N LN O 4N LN N number of sequences Y 2 Neighbors cell 2N 1 XY YZ XYZ Z Therefore O 2N LN X CS262 Lecture 9 Win07 Batzoglou XZ Progressive Alignment pxy pxyzw pzw x y z w When evolutionary tree is known Align closest first in the order of the tree In each step align two sequences x y or profiles px py to generate a new alignment with associated profile presult Weighted version Tree edges have weights proportional to the divergence in that edge New profile is a weighted average of two old profiles CS262 Lecture 9 Win07 Batzoglou Progressive Alignment x Example y Profile z A C G T px 0 8 0 2 0 0 0 py 0 6 w0 0 0 0 4 When evolutionary tree is known s px py 0 8 0 6 s A A 0 2 0 6 s C A 0 8 0 4 s A 0 2 0 4 s C Align closest first in the order of the tree In each step align two sequencesResult x y or profiles px py0 1 to generate a new pxy 0 7 0 0 0 2 alignment with associated profile presult s px 0 8 1 0 s A 0 2 1 0 s C Weighted version Tree edges have weights proportional to the divergence in that edge Result p 0 4 0 1 0 0 0 5 New profile is a weighted average of two old x profiles CS262 Lecture 9 Win07 Batzoglou Progressive Alignment x y z w When evolutionary tree is unknown Perform all pairwise alignments Define distance matrix D where D x y is a measure of evolutionary distance based on pairwise alignment Construct a tree UPGMA Neighbor Joining Other methods Align on the tree CS262 Lecture 9 Win07 Batzoglou Heuristics to improve alignments Iterative refinement schemes A based search Consistency Simulated Annealing CS262 Lecture 9 Win07 Batzoglou Iterative Refinement One problem of progressive alignment Initial alignments are frozen even when new evidence comes Example x y GAAGTT GAC TT z w GAACTG GTACTG CS262 Lecture 9 Win07 Batzoglou Frozen Now clear correct y GA CTT Iterative Refinement Algorithm Barton Stenberg 1 For j 1 to N Remove xj and realign to x1 xj 1xj 1 xN 2 Repeat 4 until convergence allow y to vary x z fixed projection CS262 Lecture 9 Win07 Batzoglou z y x Iterative Refinement Example align x y z w xy zw x y z w GAAGTTA GAC TTA GAACTGA GTACTGA After realigning y x y z w CS262 Lecture 9 Win07 Batzoglou GAAGTTA G ACTTA GAACTGA GTACTGA 3 matches Iterative Refinement Example not handled well CS262 Lecture 9 Win07 Batzoglou x y1 y2 y3 GAAGTTA GAC TTA GAC TTA GAC TTA z w GAACTGA GTACTGA Realigning any single yi changes nothing Consistency z zk xi x y yj CS262 Lecture 9 Win07 Batzoglou yj Consistency z zk xi x y yj yj Basic method for applying consistency Compute all pairs of alignments xy xz yz When aligning x y during progressive alignment For each xi yj let s xi yj function of xi yj axz ayz Align x and y with DP using the modified s function CS262 Lecture 9 Win07 Batzoglou Real world protein aligners MUSCLE High throughput One of the best in accuracy ProbCons High accuracy Reasonable speed CS262 Lecture 9 Win07 Batzoglou MUSCLE at a glance 1 Fast measurement of all pairwise distances between sequences DDRAFT x y defined in terms of common k mers k 3 O N 2 L logL time 2 Build tree TDRAFT based on those distances with UPGMA 3 Progressive alignment over TDRAFT resulting in multiple alignment MDRAFT 4 Measure new Kimura based distances D x y based on MDRAFT 5 Build tree T based on D 6 Progressive alignment over T to build M 7 Iterative refinement for many rounds do Tree Partitioning Split M on one branch and realign the two resulting profiles If new alignment M has better sum …
View Full Document