DOC PREVIEW
Stanford CS 374 - Protein Multiple Alignment

This preview shows page 1-2-3-4-25-26-27-52-53-54-55 out of 55 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Protein Multiple AlignmentPapersOutlineIntroductionSlide 5Slide 6BackgroundSlide 8BackgroundSlide 10MUSCLESlide 12Three StagesStage 1: Draft ProgressiveSlide 15Slide 16Slide 17Slide 18Stage 2: Improved ProgressiveSlide 20Slide 21Slide 22Slide 23Stage 3: RefinementSlide 25Slide 26Slide 27Slide 28MUSCLE ReviewSlide 30Hidden Markov Models (HMMs)Pairwise AlignmentLazy Teacher AnalogyViterbi vs MEAProbConsNotationSlide 37Step 1: Computation of posterior-probability matricesStep 2: Computation of expected accuraciesStep 2: Computation of expected accuracies (continued)ConsistencyStep 3: Probabilistic consistency transformationStep 3: Probabilistic consistency transformation (continued)Slide 44Step 4: Computation of guide treeStep 5: Progressive alignmentPost-processing step: iterative refinementProbCons overviewSlide 49ConclusionResultsReliability ScoresQuestions?ReferencesReferences (continued)1Protein Multiple Alignmentby Konstantin Davydov2PapersMUSCLE: a multiple sequence alignment method with reduced time and space complexity by Robert C EdgarProbCons: Probabilistic Consistency-based Multiple Sequence Alignment by Chuong B. Do, Mahathi S. P. Mahabhashyam, Michael Brudno, and Serafim Batzoglou3OutlineIntroductionIntroductionBackgroundMUSCLEProbConsConclusion4IntroductionWhat is multiple protein alignment?Given N sequences of amino acids x1, x2 … xN:Insert gaps in each of the xis so that–All sequences have the same length–Score of the global map is maximumACCTGCAACTTCAAACCTGCA--AC--TTCAA5IntroductionMotivationPhylogenetic tree estimationSecondary structure predictionIdentification of critical regions6OutlineIntroductionBackgroundBackgroundMUSCLEProbConsConclusion7BackgroundAligning two sequencesACCTGCAACTTCAAACCTGCA--AC--TTCAA8BackgroundAGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGAAGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTCxyz9BackgroundUnfortunately, this can get very expensiveAligning N sequences of length L requires a matrix of size LN, where each square in the matrix has 2N-1 neighbors This gives a total time complexity of O(2N LN)10OutlineIntroductionBackgroundMUSCLEMUSCLEProbConsConclusion11MUSCLE12MUSCLEBasic Strategy: A progressive alignment is built, to which horizontal refinement is appliedThree stagesAt end of each stage, a multiple alignment is available and the algorithm can be terminated13Three StagesDraft ProgressiveImproved ProgressiveRefinement14Stage 1: Draft ProgressiveSimilarity Measure –Calculated using k-mer counting.ACCATGCGAATGGTCCACAATGk-mer: ATGCCAscore: 3215Stage 1: Draft ProgressiveDistance estimate –Based on the similarities, construct a triangular distance matrix.XXXX0.6XXX0.80.2XX0.30.70.5XX1X2X3X4X3X2X1X416Stage 1: Draft ProgressiveTree construction –From the distance matrix we construct a treeXXXX0.6XXX0.80.2XX0.30.70.5XX1X1X2X3X4X2X3X4X1X4X2X3X3X2X4X1X3X2X4X117Stage 1: Draft Progressive18Stage 1: Draft ProgressiveProgressive alignment –A progressive alignment is built by following the branching order of the tree. This yields a multiple alignment of all input sequences at the root.X1X4X2X3X3X3X2X2X4X4X1X1Alignment of X1, X2, X3, X419Stage 2: Improved ProgressiveAttempts to improve the tree and uses it to build a new progressive alignment. This stage may be iterated.X1X4X3X2XXXXXXXXXXX1X2X3X4X3X2X1X420Stage 2: Improved ProgressiveSimilarity Measure –Similarity is calculated for each pair of sequences using fractional identity computed from their mutual alignment in the current multiple alignmentTCC--AATCA--GATCA--AAG--ATACT--CTGCTCC--AATCA--AA21Stage 2: Improved ProgressiveTree construction –A tree is constructed by computing a Kimura distance matrix and applying a clustering method to itXXXXXXXXXXX1X2X3X4X3X2X1X422Stage 2: Improved ProgressiveTree comparison –The new tree is compared to the previous tree by identifying the set of internal nodes for which the branching order has changed23Stage 2: Improved ProgressiveProgressive alignment –A new progressive alignment is built X2X4X1X3X3X3X1X1X4X4X2X2New Alignment24Stage 3: RefinementPerforms iterative refinement25Stage 3: RefinementChoice of bipartition –An edge is removed from the tree, dividing the sequences into two disjoint subsets X5X4X2X3X1X5X1X4X2X326Stage 3: RefinementProfile Extraction –The multiple alignment of each subset is extracted from current multiple alignment. Columns made up of indels only are removedTCC--AATCA--GATCA--AAG--ATACT--CTGCX1X3X4X5X2TCC--AATCA--AATCA--GAT--CTGCG--ATACTCCAATCAAA27Stage 3: RefinementRe-alignment –The two profiles are then realigned with each other using profile-profile alignment.TCA--GAT--CTGCG--ATACT--CCAAT--CAAATCA--GAT--CTGCG--ATACTCCAATCAAA28Stage 3: RefinementAccept/Reject –The score of the new alignment is computed, if the score is higher than the old alignment, the new alignment is retained, otherwise it is discarded.T--CCAAT--CAAATCA--GAT--CTGCG--ATACTCC--AATCA--GATCA--AAG--ATACT--CTGCNew OldOR29MUSCLE ReviewPerformance–For alignment of N sequences of length L–Space complexity: O(N2+L2)–Time complexity: O(N4+NL2)–Time complexity without refinement: O(N3+NL2)30OutlineIntroductionBackgroundMUSCLEProbConsConclusion31Hidden Markov Models (HMMs)MJ IAGCC-AGC-GCCCAGTIMMMJMMMXY--YX--:X:Y32Pairwise AlignmentViterbi Algorithm–Picks the alignment that is most likely to be the optimal alignment–However: the most likely alignment is not the most accurate–Alternative: find the alignment of maximum expected accuracy33Lazy Teacher Analogy10 students take a 10-question true-false quizHow do you make the answer key?Viterbi Approach: Use the answer sheet of the best studentMEA Approach: Weighted majority vote4. F4. F4. T4. F4. F4. F4. F4. F4. F4. TA-AAB A-AB+B+B+B- B-C34Viterbi vs MEAViterbi–Picks the alignment with the highest chance of being completely correctMaximum Expected Accuracy–Picks the alignment with the highest expected number of correct predictions35ProbConsBasic Strategy: Uses Hidden Markov Models (HMM) to predict the probability of an alignment.Uses Maximum Expected Accuracy instead of the Viterbi alignment.5 steps36NotationGiven N sequencesS = {s1, s2, … sN}a* is the optimal alignment37ProbConsStep 1: Computation of posterior-probability matricesStep 2: Computation of expected accuraciesStep 3:


View Full Document

Stanford CS 374 - Protein Multiple Alignment

Documents in this Course
Probcons

Probcons

42 pages

ProtoMap

ProtoMap

19 pages

Lecture 3

Lecture 3

16 pages

Load more
Download Protein Multiple Alignment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Protein Multiple Alignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Protein Multiple Alignment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?