Computational Biology, Part 4 Similarity Matrices/Statistics of Pattern AppearanceDeriving and Using Similarity MatricesOrigin of PAM matricesSlide 4Slide 5Slide 6Slide 7Use of PAM matricesSlide 9Dayhoff PAM250 similarity matrix in log-odds formDayhoff PAM250 similarity matrix (partial)Updated PAM matricesBLOSUM62 matrixOrigin of BLOSUM matricesSlide 15BLOSUM62 in log-odds formComparison of PAM250 and BLOSUM62Sequence Analysis TasksStatistics of pattern appearanceDetermining mononucleotide frequenciesDetermining dinucleotide frequenciesDetermining conditional dinucleotide probabilitiesIllustration of probability calculationInteractive DemonstrationIllustration using dinucleotide probabilitiesExpansionsProofNeed further convincing?More complicated probability illustrationIllustration (continued)Slide 31Multiply then addHow do we program this?Will this work?No to forA recursive solutionSite Probability Calculation via RecursionPossibleSites.cPowerPoint PresentationSlide 40Slide 41Slide 42Another illustrationExpected number and spacingSlide 45Slide 46Probability of consecutive matchesSlide 48Expected longest match lengthKarlin-Altschul formulationEstimating Significance of Local AlignmentSlide 52Reading for next classComputational Biology, Part 4Similarity Matrices/Statistics of Pattern AppearanceComputational Biology, Part 4Similarity Matrices/Statistics of Pattern AppearanceRobert F. MurphyRobert F. MurphyCopyright Copyright 1996-2007. 1996-2007.All rights reserved.All rights reserved.Deriving and Using Similarity MatricesDeriving and Using Similarity MatricesOrigin of PAM matricesOrigin of PAM matricesTake aligned set of closely related proteinsTake aligned set of closely related proteins71 groups of proteins that were at least 85% 71 groups of proteins that were at least 85% similarsimilarEach group of sequences were organized into a Each group of sequences were organized into a phylogenetic treephylogenetic treeCreates a model of the order in which Creates a model of the order in which substitutions occurredsubstitutions occurredCount the number of changes of each amino acid Count the number of changes of each amino acid into every other amino acidinto every other amino acidEach substitution is considered to be an Each substitution is considered to be an “accepted mutation” - an amino acid change “accepted mutation” - an amino acid change “accepted” by natural selection“accepted” by natural selectionOrigin of PAM matricesOrigin of PAM matricesFor each group of proteins, find the “exposure to For each group of proteins, find the “exposure to mutation” for each amino acid. Product of mutation” for each amino acid. Product of the frequency of each amino acid in that groupthe frequency of each amino acid in that groupthe number of all amino acid changes per 100 the number of all amino acid changes per 100 residues (total number of amino acid changes residues (total number of amino acid changes divided by the combined length of all sequences divided by the combined length of all sequences in that group, then times 100)in that group, then times 100)For each group, divide counts of changes for each For each group, divide counts of changes for each amino acid pair by the exposure to mutation of the amino acid pair by the exposure to mutation of the “original” amino acid“original” amino acidAverage these across all groups to create PAM1 Average these across all groups to create PAM1 matrix (Point Accepted Mutation at 1% change)matrix (Point Accepted Mutation at 1% change)Origin of PAM matricesOrigin of PAM matricesThis table is equivalent to a transition matrix for a This table is equivalent to a transition matrix for a first-order Markov model for protein sequence first-order Markov model for protein sequence evolution with a 1% overall probability of changeevolution with a 1% overall probability of changeAppropriate for comparing sequences separated by Appropriate for comparing sequences separated by an evolutionary distance that would yield changes an evolutionary distance that would yield changes in 1% of the positionsin 1% of the positionsNote that PAM1 is not symmetricNote that PAM1 is not symmetricTo compare sequences across greater distances, To compare sequences across greater distances, can multiply the PAM1 matrix by itself (if Markov can multiply the PAM1 matrix by itself (if Markov model is correct)model is correct)Origin of PAM matricesOrigin of PAM matricesSquaring PAM1 considers all the ways that an Squaring PAM1 considers all the ways that an “original” amino acid may have changed over two “original” amino acid may have changed over two steps of 1% mutation rate eachsteps of 1% mutation rate eachFor staying the same, sum probability that it didn’t For staying the same, sum probability that it didn’t change in first step times probability that it didn’t change in first step times probability that it didn’t change in second step plus product of all the change in second step plus product of all the probability of all changes in first step times probability of all changes in first step times probability of changing backprobability of changing backFor changing from For changing from x x -> -> yy, consider sum of , consider sum of products of all the changes that could have products of all the changes that could have happened in first step (happened in first step (x x -> -> zz) times probability of ) times probability of changing from that into y (changing from that into y (z z -> -> yy))This gives PAM2 (still not symmetric!)This gives PAM2 (still not symmetric!)Origin of PAM matricesOrigin of PAM matricesCan raise PAM1 to any power (e.g., Can raise PAM1 to any power (e.g., PAM250)PAM250)Major effect of raising PAM matrix to a Major effect of raising PAM matrix to a power is to decrease the probability that a power is to decrease the probability that a particular amino acid is unchanged (and particular amino acid is unchanged (and increase the probabilities for it to change increase the probabilities for it to change into all others)into all others)Use of PAM matricesUse of PAM matricesSum of the product of diagonal elements times Sum of the product of diagonal elements times overall frequency of each amino acid gives overall frequency of each amino acid gives expected degree of similarity between two proteins expected degree of similarity between two
View Full Document