DARTMOUTH BIOL 039 - MULTIPL SEQUENCE ALIGNMENTS

Unformatted text preview:

B o b G r o s s , B i o 3 9 / 1 3 9Multiple Sequence AlignmentsB o b G r o s s , B i o 3 9 / 1 3 9•display and summarize relationships among sets of sequences•protein modeling and structure prediction•molecular evolution•detection and quantifying of sequence motifs•an alignment of an entire family of genes may provide more information than any pair•may produce a more accurate alignment•provides a range of permissible variations•may provide information about secondary structureImportance of MSAsB o b G r o s s , B i o 3 9 / 1 3 9Multiple Alignments Improve Alignment AccuracyFigure 9.1in this figure, accuracy is based on protein structure as well as sequence alignmentB o b G r o s s , B i o 3 9 / 1 3 9MSA Using Dynamic Programmingfrom “Bioinformatics”, David Mount, CSH Press (2001)B o b G r o s s , B i o 3 9 / 1 3 9Limiting Search Volume by Doing Pairwise Comparisonsfrom “Bioinformatics”, David Mount, CSH Press (2001)bounded by optimal alignment of B and C and a projection of the guessed alignment for all three sequencessequence Asequence Csequence Bvolume of cube that needs to be searched to find the optimal alignmentB o b G r o s s , B i o 3 9 / 1 3 9•progressive global alignment starting with most alike pair and then adding more sequences•iterative methods - make initial group alignments and then revise•local conserved patterns that occur in same order in all sequences•statistical methods and probabilistic models•one challenge is designing a good scoring systemMSA StrategiesB o b G r o s s , B i o 3 9 / 1 3 9Key Steps in MSAB o b G r o s s , B i o 3 9 / 1 3 9•identify sequences to align through database searching or other meansKey Steps in MSAB o b G r o s s , B i o 3 9 / 1 3 9•identify sequences to align through database searching or other means•locate the regions of each sequence to include in the alignment-do not try to align multiple sequences that are substantially different in length-most programs are designed to align multiple sequences that are roughly same length-appropriate alignment regions may be identified through dot matrix comparisonsKey Steps in MSAB o b G r o s s , B i o 3 9 / 1 3 9•identify sequences to align through database searching or other means•locate the regions of each sequence to include in the alignment-do not try to align multiple sequences that are substantially different in length-most programs are designed to align multiple sequences that are roughly same length-appropriate alignment regions may be identified through dot matrix comparisons•assess similarities within a set of sequences using pairwise comparisons-by examining z-scores -the first set of subsequences to align should have z-scores of at least 6, ideally-an alternative to z-scores would be to identify sequences in a BLAST search that have an expect value (E) of much less than 1Key Steps in MSAB o b G r o s s , B i o 3 9 / 1 3 9Key Steps in MSAB o b G r o s s , B i o 3 9 / 1 3 9•run the multiple alignment programKey Steps in MSAB o b G r o s s , B i o 3 9 / 1 3 9•run the multiple alignment program•manually inspect alignment for problems-look for regions that have many gaps-use an alignment visualization tool to identify regions of conserved physiochemical properties across the entire alignment. If such regions don’t exist, look at subregions.Key Steps in MSAB o b G r o s s , B i o 3 9 / 1 3 9•run the multiple alignment program•manually inspect alignment for problems-look for regions that have many gaps-use an alignment visualization tool to identify regions of conserved physiochemical properties across the entire alignment. If such regions don’t exist, look at subregions.•remove sequences that disrupt the alignment and realign the restKey Steps in MSAB o b G r o s s , B i o 3 9 / 1 3 9•run the multiple alignment program•manually inspect alignment for problems-look for regions that have many gaps-use an alignment visualization tool to identify regions of conserved physiochemical properties across the entire alignment. If such regions don’t exist, look at subregions.•remove sequences that disrupt the alignment and realign the rest•after identifying the key residues across all the remaining sequences, try to add back the sequences that were removed to maximize the number of key features that are preserved.Key Steps in MSAB o b G r o s s , B i o 3 9 / 1 3 9Scoring MSAsfrom “Bioinformatics”, David Mount, CSH Press (2001)B o b G r o s s , B i o 3 9 / 1 3 9Scoring MSAsfrom “Bioinformatics”, David Mount, CSH Press (2001)B o b G r o s s , B i o 3 9 / 1 3 9Hierarchical MSAsFigure 9.2Figure 12.2B o b G r o s s , B i o 3 9 / 1 3 9Hierarchical MSAsFigure 9.2Figure 12.2B o b G r o s s , B i o 3 9 / 1 3 9Hierarchical MSAsFigure 9.2Figure 12.2B o b G r o s s , B i o 3 9 / 1 3 9Hierarchical MSAsFigure 9.2Figure 12.2B o b G r o s s , B i o 3 9 / 1 3 9•different scoring matrices for different stages of alignment•biases for gap insertion location (eg - not in middle of alpha-helix)•allows realignment to fine tune the overall initial result•can read a corresponding secondary structure to guide the alignment•can calculate neighbor-joining treesClustal WB o b G r o s s , B i o 3 9 / 1 3 9Clustal W WeightingFrom: D.W. Mount, Bioinformatics (2001), CSH PressB o b G r o s s , B i o 3 9 / 1 3 9Using Clustal W WeightingPair#1: sequence A (weight a) —-K—-! sequence B (weight b) —-I—-Pair#2: sequence C (weight c) —-L—-! sequence D (weight d) —-V—- a * c * score(K,L) + a * d*score(K, V ) + b * c * score(I,L) + b *d * score(I,V )4From: D.W. Mount, Bioinformatics (2001), CSH PressScore for matching these two pairs of alignments:B o b G r o s s , B i o 3 9 / 1 3 9JalView Output (ClustalW)Figure 12.6B o b G r o s s , B i o 3 9 / 1 3 9JalView OutputFigure 12.6B o b G r o s s , B i o 3 9 / 1 3 9ALSCRIPT OutputFigure 12.3B o b G r o s s , B i o 3 9 / 1 3 9AMAS OutputFigure 12.4B o b G r o s s , B i o 3 9 / 1 3 9AMAS OutputFigure 12.4B o b G r o s s , B i o 3 9 / 1 3 9Is This Significant for p62?B o b G r o s s , B i o 3 9 / 1 3 9MSA Strategy•choose sequences based on known sequence similarity or other criteria•mask …


View Full Document

DARTMOUTH BIOL 039 - MULTIPL SEQUENCE ALIGNMENTS

Download MULTIPL SEQUENCE ALIGNMENTS
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view MULTIPL SEQUENCE ALIGNMENTS and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view MULTIPL SEQUENCE ALIGNMENTS 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?