DOC PREVIEW
Stanford CS 374 - PROTEIN MULTIPLE SEQUENCE ALIGNMENT

This preview shows page 1-2-3-18-19-36-37-38 out of 38 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Marina SirotaCS374October 19, 2004PROTEIN MULTIPLE SEQUENCE ALIGNMENTOUTLINE•Introduction•Alignments•Pairwise vs. Multiple•DNA vs. Protein•MUSCLE•PROBCONS•ConclusionINTRODUCTION1. Sequence Analysis – Look at DNA and protein sequences, searching for clues about structure, function and control 2. Structure Analysis – Examine biological structures, to learn more about structure, function and control3. Functional Analysis – Understand how the sequences and structures lead to the biological functionPAIRWISE SEQUENCE ALIGNMENT (REVIEW)•The Problem: Given two sequences of letters, and a scoring scheme for evaluating matching letters, find the optimal pairing of letters from one sequence to letters of the other sequence•Basic Idea: The score of the best possible alignment that ends at a given pair of positions (i, j) is the score of the best alignment previous to those two positions plus the score for aligning those two positions.PAIRWISE SEQUENCE ALIGNMENT (REVIEW)PAIRWISE vs. MULTIPLEPAIRWISE• Evaluated by addition of match or mismatch scores for aligned pairs and affine gap penalties for unaligned pairs• O(L2) time and O(L) space via dynamic programmingMULTIPLE• Lack of proper objective scoring functions to measure alignment quality• High computational cost and no efficient algorithm that can be applied L = sequence lengthPROTEIN VS. DNA•DNA (4 characters) •Protein (20 characters)•DNA – 50% similarity•Protein – 20% similarity•DNA – fewer sequences to compare•Protein – many sequences to compare•DNA aligners need to be able to handle long sequences, protein aligners do notPROTEIN MULTIPLE SEQUENCE ALIGNMENT•Note that areas that are considered very similar don’t necessarily contain the same amino acidsMOTIVATION•Find similarity between known and unknown sequences•Protein sequence similarity implies divergence from a common ancestor and functional similarityPROBLEM•Given n sequences and a scoring scheme for evaluating matching letters, find the optimal pairing of letters between the sequences•Can be done using dynamic programming with time and space complexity O(Ln) which is not practical!!!•Need new algorithms and approachesAPPLICATIONS•Evolutionary research•Isolation of most relevant regions•Characterization of protein familiesMORE APPLICATIONS•3Dimentional structure prediction•Phylogenetic StudiesPAPERSMUSCLE: a Multiple Sequence Alignment Method with Reduced Time and Space Complexity by Robert C. EdgarProbCons: Probabilistic Consistency-based Multiple Sequence Alignment by Chuong B. Do, Michael Brudno, and Serafim BatzoglouMUSCLE – OVERVIEW•Basic Idea: A progressive alignment is built, to which horizontal refinement is applied•3 stages of the algorithm•At the completion of each, a multiple alignment is available and the algorithm can be terminated•Significant improvement in accuracy and speedMUSCLE – THE ALGORITHMStage 1: Draft Progressive – Builds a progressive alignment•Similarity of each pair of sequences is computed using •K-mer counting•Constructing a global alignment and determining fractional identity of the sequences•A tree is constructed and a root is identified•A progressive alignment is built by following the branching order of the tree, yielding a multiple alignmentMUSCLE – PROGRESSIVE ALIGNMENTMUSCLE – PROFILE-PROFILE ALIGNMENTMUSCLE – THE ALGORITHM Stage 2: Improved Progressive – Improves the tree•Similarity of each pair of sequences is computed using fractional identity from the mutual alignment •A tree is constructed by applying a clustering method to the distance matrix•The trees are compared; a set of nodes for which the branching order has changed is identified•A new alignment is built, the existing one is retained if the order is unchangedMUSCLE – TREE COMPARISONMUSCLE – THE ALGORITHM Stage 3: Refinement – Iterative Refinement is performed•An edge is deleted from a tree, dividing the sequences into two disjoint subsets•The profile (MA) of each subset is extracted•The profiles are re-aligned to each other•The score is computed, if the score has increased, the alignment is retained, otherwise it is discarded•Algorithm terminates at convergenceMUSCLE – ITERATIVE REFINEMENT S T UX ZDelete this edgeRealign these resulting profiles to each other S T UX ZMUSCLEResults:•O(N2 + L2) Space and O(N4 + NL2) Time Complexity•Improvements in selection of heuristics•Close attention paid to implementation details•Enables high-throughput applications to achieve good accuracy •http://www.drive5.com/musclePROBCONS - OVERVIEW•Alignment generation can be directly modeled as a first order Markov process involving state emissions and transitions•Uses maximum expected accuracy alignment method•Probabilistic consistency used as a scoring function•Model parameters obtained using unsupervised maximum likelihood methods•Incorporate multiple sequence information in scoring pairwise alignmentsPROBCONS – HIDDEN MARKOV MODEL• Deletion penalties on Match => Gap transitions• Extension penalties on Gap => Gap transitions• Match/Mismatch penalties on Match emissionsINSERT X INSERT YMATCHABRACA-DABRAAB-ACARDI---xxyyxxiiyyjj――yyjjxxii――• Basic HMM for sequence alignment between two sequences• M emits two letters, one from each sequence• Ix emits a letter from x that aligns to a gap• Iy emits a letter from y that aligns to a gapPROBCONS – HIDDEN MARKOV MODELPROBCONS - MAXIMUM EXPECTED ACCURACY•LAZY TEACHER ANALOGY•10 students take a 10 question true/false quiz•How do you make up the answer key?1. Use the answers of the single best student (Viterbi Algorithm)2. Use weighted majority rule (Maximum Expected Accuracy)PROBCONS – MAXIMUM EXPECTED ACCURACY•Viterbi•Picks a single alignment with the highest chance of being completely correct (analogous to Needleman-Wunch)•Mathematically, finds the alignment a which maximizesEa*[1{a = a*}] (maximum probability alignment)•Maximum Expected Accuracy•Picks alignment with the highest expected number of correct predictions•Mathematically, finds the alignment a


View Full Document

Stanford CS 374 - PROTEIN MULTIPLE SEQUENCE ALIGNMENT

Documents in this Course
Probcons

Probcons

42 pages

ProtoMap

ProtoMap

19 pages

Lecture 3

Lecture 3

16 pages

Load more
Download PROTEIN MULTIPLE SEQUENCE ALIGNMENT
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view PROTEIN MULTIPLE SEQUENCE ALIGNMENT and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view PROTEIN MULTIPLE SEQUENCE ALIGNMENT 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?