Inverse AlignmentThe Papers To Be PresentedSequence Comparison - AlignmentScoring AlignmentsAn Example Of Scoring an Alignment Using a Scoring MatrixScoring Matrices in PracticeGap PenaltiesGap Penalties (Cont’d)Parametric Sequence AlignmentInverse Parametric AlignmentInverse Optimal AlignmentInverse Near-Optimal AlignmentInverse Near-Optimal Alignment (Cont’d)Inverse Unique-Optimal AlignmentInverse Unique-Optimal Alignment (Cont’d)Let There Be Linear Functions …Let There Be Linear Functions … (Example I)Let There Be Linear Functions … (Example II)Linear Programming ProblemReducing The Inverse Alignment Problems To Linear ProgrammingSeparation TheoremSeparation Theorem (Cont’d)Slide 23Cutting-Plane AlgorithmComplexity of Inverse AlignmentApplication to Global AlignmentApplication to Global Alignment (Cont’d)Slide 28Computational ResultsComputational Results (Cont’d)Slide 31CONTRAlignPair-HMMs for Sequence AlignmentPair-HMMs … (Cont’d)Training Pair-HMMsGenerating Alignments Using Pair-HMMsPair-CRFsTraining Pair-CRFsProperties of Pair-CRFsChoice of Model Topology in CONTRAlignChoice of Feature Sets in CONTRAlignResults: Comparison of Model Topologies and Feature SetsResults: Comparison to Modern Sequence Alignment ToolsResults: Alignment Accuracy in the “Twilight Zone”Slide 45Slide 46Inverse AlignmentInverse AlignmentCS 374CS 374Bahman BahmaniBahman BahmaniFall 2006Fall 2006The Papers To Be PresentedThe Papers To Be PresentedSequence Comparison - AlignmentSequence Comparison - AlignmentAlignments can be thought Alignments can be thought of as two sequences of as two sequences differing due to mutations differing due to mutations happened during the happened during the evolutionevolutionAGGCTATCACCTGACCTCCAGGCCGATGCCCTAGCTATCACGACCGCGGTCGATTTGCCCGAC-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- | | | | | | | | | | | | | x | | | | | | | | | | | TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGACScoring AlignmentsScoring AlignmentsAlignments are based on three basic operations:Alignments are based on three basic operations:1.1.SubstitutionsSubstitutions2.2. InsertionsInsertions3.3. DeletionsDeletionsA score is assigned to each single operation (resulting in a A score is assigned to each single operation (resulting in a scoring matrix and also in gap penalties). Alignments are then scoring matrix and also in gap penalties). Alignments are then scored by scored by adding the scoresadding the scores of their operations. of their operations.Standard formulations of string alignment optimize the above Standard formulations of string alignment optimize the above score of the alignment.score of the alignment.An Example Of Scoring an An Example Of Scoring an Alignment Using a Scoring MatrixAlignment Using a Scoring MatrixAARRNNKKAA55-2-2-1-1-1-1RR--77-1-133NN----7700KK------66Scoring Matrices in Practice Scoring Matrices in Practice Some choices for substitution scores are now common, largely due Some choices for substitution scores are now common, largely due to conventionto conventionMost commonly used Amino-Acid substitution matrices:Most commonly used Amino-Acid substitution matrices:PAM (Percent Accepted Mutation)PAM (Percent Accepted Mutation) BLOSUM (Blocks Amino Acid Substitution Matrix)BLOSUM (Blocks Amino Acid Substitution Matrix)BLOSUM50 Scoring MatrixBLOSUM50 Scoring MatrixGap PenaltiesGap PenaltiesInclusion of gaps and gap penalties is necessary Inclusion of gaps and gap penalties is necessary to obtain the best alignmentto obtain the best alignmentIf gap penalty is too high, gaps will never appear If gap penalty is too high, gaps will never appear in the alignmentin the alignment AATGCTGCAATGCTGC ATGCTGCAATGCTGCAIf gap penalty is too low, gaps will appear If gap penalty is too low, gaps will appear everywhere in the alignmenteverywhere in the alignment AATGCTGC----AATGCTGC---- A----TGCTGCAA----TGCTGCAGap Penalties (Cont’d)Gap Penalties (Cont’d)Separate penalties for gap opening and gap extensionSeparate penalties for gap opening and gap extensionOpening: The cost to introduce a gapOpening: The cost to introduce a gapExtension: The cost to elongate a gapExtension: The cost to elongate a gapOpening a gap is costly, while extending a gap is cheap Opening a gap is costly, while extending a gap is cheap Despite scoring matrices, no gap penalties are commonly agreed uponDespite scoring matrices, no gap penalties are commonly agreed uponLETVGYW----L-5 -1 -1 -1Parametric Sequence AlignmentParametric Sequence AlignmentFor a given pair of strings, the alignment problem is For a given pair of strings, the alignment problem is solved for solved for effectively all possible choiceseffectively all possible choices of the scoring of the scoring parameters and penalties (exhaustive search).parameters and penalties (exhaustive search).A A correct alignmentcorrect alignment is then used to find the best is then used to find the best parameter values.parameter values.However, this method is However, this method is very inefficientvery inefficient if the number of if the number of parameters is large.parameters is large.Inverse Parametric AlignmentInverse Parametric AlignmentINPUT: an alignment of a pair of strings.INPUT: an alignment of a pair of strings.OUTPUT: a choice of parameters that makes the input OUTPUT: a choice of parameters that makes the input alignment be an optimal-scoring alignment of its strings.alignment be an optimal-scoring alignment of its strings.From Machine Learning point of view, this learns the From Machine Learning point of view, this learns the parameters for optimal alignment from training examples parameters for optimal alignment from training examples of correct alignments.of correct alignments.Inverse Optimal AlignmentInverse Optimal Alignment Definition (Inverse Optimal Alignment): Definition (Inverse Optimal Alignment): INPUT: alignments INPUT: alignments AA11, A, A22, …, A, …, Akk of strings, of strings, an alignment scoring function an alignment scoring function ffww with parameters with parameters ww = ( = (ww11, w, w22, …, w, …, wpp). ). OUTPUT: values OUTPUT: values x x = (= (xx11, x, x22, …, x, …, xpp) for ) for wwGOAL: each input alignment be an optimal alignment of GOAL: each input alignment be an optimal alignment of its strings under its strings under ffxx . .ATTENTION: ATTENTION: This problem may
View Full Document