DOC PREVIEW
CMU BSC 03510 - lecture
Pages 20

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Computational Biology, Part 2 Sequence MotifsSlides from Chapter 4Describing features using frequency matricesSlide 4Frequency matrices (continued)Matlab DemonstrationFrequency matrixLogo ExampleLogos for displaying sequence motifsFrequency Matrices, PSSMs, and ProfilesMethods for converting frequency matrices to PSSMsPseudo-countsFinding occurrences of a sequence feature using a ProfileBlock Diagram for Building a PSSM – Aligned SequencesBlock Diagram for Building a PSSM – Unaligned SequencesBlock Diagram for Searching with a PSSMBlock Diagram for Searching for sequences related to a family with a PSSMConsensus sequences vs. PSSMsConsensus sequences vs. frequency matricesReading for next classComputational Biology, Part 2Sequence MotifsComputational Biology, Part 2Sequence MotifsRobert F. MurphyRobert F. MurphyCopyright Copyright  1996, 1999-2009. 1996, 1999-2009.All rights reserved.All rights reserved.Slides from Chapter 4Slides from Chapter 4Ch04_Motifs_mod.pptCh04_Motifs_mod.pptDescribing features using frequency matricesDescribing features using frequency matricesGoal: Describe a sequence feature (or Goal: Describe a sequence feature (or motifmotif) more quantitatively than possible ) more quantitatively than possible using consensus sequencesusing consensus sequencesNeed to describe how often particular bases Need to describe how often particular bases are found in particular positions in a are found in particular positions in a sequence featuresequence featureDescribing features using frequency matricesDescribing features using frequency matricesDefinitionDefinition: For a feature of length : For a feature of length mm using using an alphabet of an alphabet of nn characters, a characters, a frequency frequency matrix matrix is an is an nn by by mm matrix in which each matrix in which each element contains the frequency at which a element contains the frequency at which a given member of the alphabet is observed at given member of the alphabet is observed at a given position in an aligned set of a given position in an aligned set of sequences containing the featuresequences containing the featureFrequency matrices (continued)Frequency matrices (continued)Three uses of frequency matricesThree uses of frequency matricesDescribeDescribe a sequence feature a sequence featureCalculate Calculate probability of occurrenceprobability of occurrence of feature in of feature in a random sequencea random sequenceCalculate Calculate degree of matchdegree of match between a new between a new sequence and a featuresequence and a featureMatlab DemonstrationMatlab Demonstration% read some aligned sequences provided with the bioinformatics % read some aligned sequences provided with the bioinformatics toolboxtoolboxseqs = fastaread('pf00002.fa');seqs = fastaread('pf00002.fa');seqdisp(seqs);seqdisp(seqs);startposition=4; endposition=13;startposition=4; endposition=13;[P,S] = seqprofile(seqs,'limits',[startposition endposition]);[P,S] = seqprofile(seqs,'limits',[startposition endposition]);disp([' ' sprintf('%2d ',[1:size(P,2)])]);disp([' ' sprintf('%2d ',[1:size(P,2)])]);for i=1:length(S)for i=1:length(S) disp([S(i) ' ' sprintf('%4.3f ',P(i,:))])disp([S(i) ' ' sprintf('%4.3f ',P(i,:))])endendseqlogo(seqs,'startat',startposition,'endat',endposition,'alphabet','aa’);seqlogo(seqs,'startat',startposition,'endat',endposition,'alphabet','aa’);Frequency matrixFrequency matrixLogo ExampleLogo ExampleLogos for displaying sequence motifsLogos for displaying sequence motifshttp://www.ccrnp.ncifcrf.gov/~toms/sequencelogo. htmlFree logo maker at Free logo maker at http://weblogo.berkeley.edu/http://weblogo.berkeley.edu/Frequency Matrices, PSSMs, and ProfilesFrequency Matrices, PSSMs, and ProfilesA A frequency matrixfrequency matrix can be converted to a can be converted to a PPosition-osition-SSpecific pecific SScoring coring MMatrix (atrix (PSSMPSSM) ) by converting by converting frequenciesfrequencies to to scoresscores PSSMPSSMs also called s also called PPosition osition WWeight eight MMatrixes (atrixes (PWMPWMs) or s) or ProfilesProfilesMethods for converting frequency matrices to PSSMsMethods for converting frequency matrices to PSSMsUsing log ratio of observed to expectedUsing log ratio of observed to expectedwhere where m(j,i)m(j,i) is the frequency of character is the frequency of character jj observed at position observed at position i i and and f(j)f(j) is the overall frequency of character j (usually in some is the overall frequency of character j (usually in some large set of sequences)large set of sequences)Using amino acid substitution matrix (Dayhoff similarity Using amino acid substitution matrix (Dayhoff similarity matrix) [see later]matrix) [see later]€ score( j,i) = log m( j,i) / f ( j)Pseudo-countsPseudo-countsHow do we get a score for a position with How do we get a score for a position with zero counts for a particular character? Can’t zero counts for a particular character? Can’t take log(0).take log(0).Solution: add a small number to all Solution: add a small number to all positions with zero frequencypositions with zero frequencyFinding occurrences of a sequence feature using a ProfileFinding occurrences of a sequence feature using a ProfileAs with finding occurrences of a consensus As with finding occurrences of a consensus sequence, we consider all positions in the sequence, we consider all positions in the target sequence as candidate matchestarget sequence as candidate matchesFor each position, we calculate a score by For each position, we calculate a score by “looking up” the value corresponding to the “looking up” the value corresponding to the base at that positionbase at that positionBlock Diagram for Building a PSSM – Aligned SequencesBlock Diagram for Building a PSSM – Aligned Sequences PSSM builderSet of Aligned Sequence FeaturesExpected frequencies of each sequence elementPSSMBlock Diagram for Building a PSSM – Unaligned SequencesBlock Diagram for Building a PSSM – Unaligned Sequences PSSM builderSet of unaligned sequencesExpected frequencies of each sequence elementPSSMParameters for aligning (i.e., expected length)Block Diagram for Searching with a PSSMBlock Diagram for Searching with a PSSMPSSM searchPSSMSet of Sequences to searchSequences that match above thresholdThresholdPositions and scores of matchesBlock Diagram for Searching for


View Full Document

CMU BSC 03510 - lecture

Download lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?