Computational Biology, Part 2 Sequence MotifsSlides from Chapter 4Describing features using frequency matricesSlide 4Frequency matrices (continued)Matlab DemonstrationFrequency matrixLogo ExampleLogos for displaying sequence motifsFrequency Matrices, PSSMs, and ProfilesMethods for converting frequency matrices to PSSMsPseudo-countsFinding occurrences of a sequence feature using a ProfileBlock Diagram for Building a PSSM – Aligned SequencesBlock Diagram for Building a PSSM – Unaligned SequencesBlock Diagram for Searching with a PSSMBlock Diagram for Searching for sequences related to a family with a PSSMConsensus sequences vs. PSSMsConsensus sequences vs. frequency matricesReading for next classComputational Biology, Part 2Sequence MotifsComputational Biology, Part 2Sequence MotifsRobert F. MurphyRobert F. MurphyCopyright Copyright 1996, 1999-2009. 1996, 1999-2009.All rights reserved.All rights reserved.Slides from Chapter 4Slides from Chapter 4Ch04_Motifs_mod.pptCh04_Motifs_mod.pptDescribing features using frequency matricesDescribing features using frequency matricesGoal: Describe a sequence feature (or Goal: Describe a sequence feature (or motifmotif) more quantitatively than possible ) more quantitatively than possible using consensus sequencesusing consensus sequencesNeed to describe how often particular bases Need to describe how often particular bases are found in particular positions in a are found in particular positions in a sequence featuresequence featureDescribing features using frequency matricesDescribing features using frequency matricesDefinitionDefinition: For a feature of length : For a feature of length mm using using an alphabet of an alphabet of nn characters, a characters, a frequency frequency matrix matrix is an is an nn by by mm matrix in which each matrix in which each element contains the frequency at which a element contains the frequency at which a given member of the alphabet is observed at given member of the alphabet is observed at a given position in an aligned set of a given position in an aligned set of sequences containing the featuresequences containing the featureFrequency matrices (continued)Frequency matrices (continued)Three uses of frequency matricesThree uses of frequency matricesDescribeDescribe a sequence feature a sequence featureCalculate Calculate probability of occurrenceprobability of occurrence of feature in of feature in a random sequencea random sequenceCalculate Calculate degree of matchdegree of match between a new between a new sequence and a featuresequence and a featureMatlab DemonstrationMatlab Demonstration% read some aligned sequences provided with the bioinformatics % read some aligned sequences provided with the bioinformatics toolboxtoolboxseqs = fastaread('pf00002.fa');seqs = fastaread('pf00002.fa');seqdisp(seqs);seqdisp(seqs);startposition=4; endposition=13;startposition=4; endposition=13;[P,S] = seqprofile(seqs,'limits',[startposition endposition]);[P,S] = seqprofile(seqs,'limits',[startposition endposition]);disp([' ' sprintf('%2d ',[1:size(P,2)])]);disp([' ' sprintf('%2d ',[1:size(P,2)])]);for i=1:length(S)for i=1:length(S) disp([S(i) ' ' sprintf('%4.3f ',P(i,:))])disp([S(i) ' ' sprintf('%4.3f ',P(i,:))])endendseqlogo(seqs,'startat',startposition,'endat',endposition,'alphabet','aa’);seqlogo(seqs,'startat',startposition,'endat',endposition,'alphabet','aa’);Frequency matrixFrequency matrixLogo ExampleLogo ExampleLogos for displaying sequence motifsLogos for displaying sequence motifshttp://www.ccrnp.ncifcrf.gov/~toms/sequencelogo. htmlFree logo maker at Free logo maker at http://weblogo.berkeley.edu/http://weblogo.berkeley.edu/Frequency Matrices, PSSMs, and ProfilesFrequency Matrices, PSSMs, and ProfilesA A frequency matrixfrequency matrix can be converted to a can be converted to a PPosition-osition-SSpecific pecific SScoring coring MMatrix (atrix (PSSMPSSM) ) by converting by converting frequenciesfrequencies to to scoresscores PSSMPSSMs also called s also called PPosition osition WWeight eight MMatrixes (atrixes (PWMPWMs) or s) or ProfilesProfilesMethods for converting frequency matrices to PSSMsMethods for converting frequency matrices to PSSMsUsing log ratio of observed to expectedUsing log ratio of observed to expectedwhere where m(j,i)m(j,i) is the frequency of character is the frequency of character jj observed at position observed at position i i and and f(j)f(j) is the overall frequency of character j (usually in some is the overall frequency of character j (usually in some large set of sequences)large set of sequences)Using amino acid substitution matrix (Dayhoff similarity Using amino acid substitution matrix (Dayhoff similarity matrix) [see later]matrix) [see later]€ score( j,i) = log m( j,i) / f ( j)Pseudo-countsPseudo-countsHow do we get a score for a position with How do we get a score for a position with zero counts for a particular character? Can’t zero counts for a particular character? Can’t take log(0).take log(0).Solution: add a small number to all Solution: add a small number to all positions with zero frequencypositions with zero frequencyFinding occurrences of a sequence feature using a ProfileFinding occurrences of a sequence feature using a ProfileAs with finding occurrences of a consensus As with finding occurrences of a consensus sequence, we consider all positions in the sequence, we consider all positions in the target sequence as candidate matchestarget sequence as candidate matchesFor each position, we calculate a score by For each position, we calculate a score by “looking up” the value corresponding to the “looking up” the value corresponding to the base at that positionbase at that positionBlock Diagram for Building a PSSM – Aligned SequencesBlock Diagram for Building a PSSM – Aligned Sequences PSSM builderSet of Aligned Sequence FeaturesExpected frequencies of each sequence elementPSSMBlock Diagram for Building a PSSM – Unaligned SequencesBlock Diagram for Building a PSSM – Unaligned Sequences PSSM builderSet of unaligned sequencesExpected frequencies of each sequence elementPSSMParameters for aligning (i.e., expected length)Block Diagram for Searching with a PSSMBlock Diagram for Searching with a PSSMPSSM searchPSSMSet of Sequences to searchSequences that match above thresholdThresholdPositions and scores of matchesBlock Diagram for Searching for
View Full Document