DOC PREVIEW
Multiple Sequence Alignment

This preview shows page 1-2-3-4-27-28-29-30-56-57-58-59 out of 59 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Multiple Sequence AlignmentSlide 2TerminologyGlobal AlignmentWhat is Multiple Sequence Alignment (MSA)?Slide 6Slide 7How to optimize alignment algorithms?Optimize alignment algorithmsPairwise AlignmentSlide 11The big-O notationSlide 13Slide 14The best solution is Dynamic Programming.Slide 16Dynamic ProgrammingMultiple AlignmentsSlide 19Optimal AlignmentWhy we do multiple alignments?Slide 22Slide 23An example of Multiple AlignmentThree Types of AlgorithmsProgressive Multiple AlignmentChoosing sequences for alignmentProgressive Pairwise MethodsSlide 29Multiple Alignment MethodGap PenaltiesThe PILEUP AlgorithmChoosing sequences for MSAPileUp considerationsPILEUP ConsiderationsCLUSTALSlide 37Multiple Alignment tools on the WebMuscle Algorithm: Using The IterationConsistency Based Algorithms: T-CoffeeSlide 41T-Coffee and Consistency…Slide 43Slide 44Slide 45Slide 46Slide 47Some URLsEditing and displaying alignmentsEditing Multiple AlignmentsDisplaying a multiple alignment in GCGExample of PrettyBox OutputGCG alignment editorsSlide 54Other editorsSlide 56CINEMAInformative ColorsSlide 59Multiple Sequence AlignmentTerminologyMotif: the biological object one attempts to model - a functional or structural domain, active site, phosphorylation site etc. Pattern: a qualitative motif description based on a regular expression-like syntaxProfile: a quantitative motif description - assigns a degree of similarity to a potential matchGlobal AlignmentGlobal algorithms are often not effective for highly diverged sequences and do not reflect the biological reality that two sequences may only share limited regions of conserved sequence. Sometimes two sequences may be derived from ancient recombination events where only a single functional domain is shared.What is Multiple Sequence Alignment (MSA)? •Multiple sequence alignment (MSA) can be seen as a generalization of Pairwise Sequence Alignment - instead of aligning two sequences, n sequences are aligned simultaneously, where n is > 2•Definition: A multiple sequence alignment is an alignment of n > 2 sequences obtained by inserting gaps (“-”) into sequences such that the resulting sequences have all length L and can be arranged in a matrix of N rows and L columns where each column represents a homologous position (each column corresponds to a specific residue in the 'prototypical' protein)Multiple Sequence AlignmentMSA applies both to nucleotide and amino acid sequencesTo construct a multiple alignment, one may have to introduce gaps in sequences at positions where there were no gaps in the corresponding pairwise alignment. This means that multiple alignments typically contain more gaps than any given pair of aligned sequencesHow to optimize alignment algorithms?Use structural information:reading frameprotein structureSequence elements are not truly independent but related by phylogenic descentSequences often contain highly conserved regionsOptimize alignment algorithmsSequences often contain highly conserved regions These regions can be used for an initial alignmentBy analyzing a number of small, independent fragments, the algorithmic complexity can be drastically reduced!Pairwise AlignmentThe alignment of two sequences (DNA or protein) is a relatively straightforward computational problem.The big-O notation•One of the most important properties of an algorithm is how its execution time increases as the problem is made larger. By a larger problem, we mean more sequences to align, or longer sequences to align. •This is the so-called algorithmic (or computational) complexity of the algorithm•There is a notation to describe the algorithmic complexity, called the big-O notation.•If we have a problem size (number of input data points) n, then an algorithm takes O(n) time if the time increases linearly with n. •If the algorithm needs time proportional to the square of n, then it is O(n2)The big-O notationIt is important to realize that an algorithm that is quick on small problems may be totally useless on large problems if it has a bad O() behavior. As a rule of thumb one can use the following characterizations, where n is the size of the problem, and c is a constant: O(c) utopian O(log n) excellent O(n) very goodO(n2) not so good O(n3) pretty bad O(cn) disasterThe big-O notation•To compute a N-wise alignment, the algorithmic complexity is something like O(c2n), where c is a constant, and n is the number of sequences.•This is a big-O disaster!The best solution is Dynamic Programming.Multiple Sequence AlignmentIn pairwise alignments, you have a two-dimensional matrix with the sequenceson each axis. The number of operations required to locate the best “path” through the matrix is approximately proportional to the product of the lengths of the two sequences A possible general method would be to extend the pairwise alignment method into a simultaneous N-wise alignment, using a complete dynamical-programming algorithm in N dimensions. Algorithmically, this is not difficult to doDynamic ProgrammingDynamic Programming is a very general programming technique. It is applicable when a large search space can be structured into a succession of stages, such that: the initial stage contains trivial solutions to sub-problems each partial solution in a later stage can be calculated by recurring a fixed number of partial solutions in an earlier stagethe final stage contains the overall solutionMultiple Alignments In theory, making an optimal alignment between two sequences is computationally straightforward (Smith-Waterman algorithm), but aligning a large number of sequences using the same method is almost impossible. The problem increases exponentially with the number of sequences involved(the product of the sequence lengths)Optimal AlignmentFor a given group of sequences, there is no single "correct" alignment, only an alignment that is "optimal" according to some set of calculations. Determining what alignment is best for a given set of sequences is really up to the judgement of the investigator.Why we do multiple alignments?In order to characterize protein families, identify shared regions of homology in a multiple sequence alignment; (this happens generally when a sequence search revealed homologies to several sequences).Determination of the consensus sequence of several aligned sequences.Consensus sequences


Multiple Sequence Alignment

Download Multiple Sequence Alignment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multiple Sequence Alignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiple Sequence Alignment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?