DOC PREVIEW
CMU CS 10810 - equence Analysis & Evolution (part II)

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1GraduateGraduate ComputationalComputationalGenomicsGenomics02-710 / 10-81002-710 / 10-810 & MSCBIO2070& MSCBIO2070Sequence Analysis & EvolutionSequence Analysis & Evolution(part II)(part II)Takis BenosTakis BenosLecture #8, February 8, 2007Lecture #8, February 8, 2007Reading: Reading: Durbin Durbin et al.et al. ““Biological Sequence AnalysisBiological Sequence Analysis””,Ch. 2, 6,Ch. 2, 6Benos 02-710/MSCBIO2070 8-FEB-2007 2OutlineOutline Sequence evolution Substitution models: DNA Substitution models: proteins Pairwise comparisons Global alignment Local alignment Database searches BLAST, FASTA Multiple sequence alignments2Benos 02-710/MSCBIO2070 8-FEB-2007 3Global alignmentGlobal alignmentGiven two strings, S and T, find their relativearrangement (incl. gaps) with the highest “score”. G A A T T C A G T T A | | G G A T C G A G A A T T C - A G T T A | | | | | G G A - T C G A2 matches, 4 mism., 0 gaps5 matches, 1 mism., 2 gapsBenos 02-710/MSCBIO2070 8-FEB-2007 4Alignment: the problem (Alignment: the problem (cntdcntd))Scoring schemes: three possible situations…• Match• Mismatch• Gap• Linear• Convex• AffineHow much??3Benos 02-710/MSCBIO2070 8-FEB-2007 5Dynamic ProgrammingDynamic ProgrammingWe apply dynamic programming when: There are only a polynomial number of subproblems Align x1…xi to y1…yj Original problem is one of the subproblems Align x1…xM to y1…yN Each subproblem is easily solved from smaller subproblemsBenos 02-710/MSCBIO2070 8-FEB-2007 6Global alignmentGlobal alignmentMijSTi-1ij-1 jMi,j = MAXMi-1, j-1 + Score(Si,Tj )Mi,j-1 + γMi-1,j + γDNA matrixPAMBLOSUMGap penaltyNeedleman & Wunsch, 19704Benos 02-710/MSCBIO2070 8-FEB-2007 7The Needleman-WunschThe Needleman-WunschAlgorithmAlgorithm1. InitializationF(0,0) = F(0,j) = F(i,0) = 02. Iterationfor i=1,…,Mfor j=1,…,N- calculate optimal F(i,j)- store Ptr(i,j)3. Termination• F(M,N) is the optimal score (or F(M,j) or F(i,N) if reached first)• Ptr(M,N) can trace back the optimal alignmentTime O(NM) Space O(NM)Benos 02-710/MSCBIO2070 8-FEB-2007 8Alignment: adding scoresAlignment: adding scores((cntdcntd))Score(match) = 1Score(mismatch) = 0Score(gap) = 05Benos 02-710/MSCBIO2070 8-FEB-2007 9Alignment: adding scoresAlignment: adding scores((cntdcntd))(Seq #1) G A A T T C A G T T A | | | | | | (Seq #2) G G A - T C - G - - A Alignment:6 matches, 1 mism., 4 gapsBenos 02-710/MSCBIO2070 8-FEB-2007 10Scoring the gaps moreScoring the gaps moreaccuratelyaccurately• A naive modelGap penalty is linear to the gap lengthNature prefers to place gaps where other gaps exist• Convex gap penalty functionγ(n+1) - γ(n) ≤ γ(n) - γ(n-1)Time O(N2M)Space O(NM)(assume N>M)γ(n)γ(n)6Benos 02-710/MSCBIO2070 8-FEB-2007 11Scoring gaps: affine gapsScoring gaps: affine gaps• Affine gaps: a compromise betweenlinear and convex gap penaltiesγ(n) = -d - e * (n-1)d: gap initiation penaltye: gap extension penaltydeγ(n)Benos 02-710/MSCBIO2070 8-FEB-2007 12Local alignmentLocal alignmentGiven two sequences, S and T, find two subsequences,s and t, whose alignment has the highest “score”amongst all subsequence pairs.EGR2_MOUSE RP [YPCPAEGCDRRFSRSDELTRHIRIH] TGHKP [FQCRICMRNFSRSDHLTTHIRTH] TGEKP [FACDY--CGRKFARSDERKRHTKIH]EGR2_HUMAN RP [YPCPAEGCDRRFSRSDELTRHIRIH] TGHKP [FQCRICMRNFSRSDHLTTHIRTH] TGEKP [FACDY--CGRKFARSDERKRHTKIH]EGR2_BRARE RP [YPCPAEGCDRRFSRSDELTRHIRIH] TGHKP [FQCRICMRNFSRSDHLTTHIRTH] TGEKP [FACDF--CGRKFARSDERKRHTKIH]MIG1_KLULA -- [-------------------------] ---RP [YVCPICQRGFHRLEHQTRHIRTH] TGERP [HACDFPGCSKRFSRSDELTRHRRIH]MIG1_KLUMA -- [-------------------------] ---RP [YMCPICHRGFHRLEHQTRHIRTH] TGERP [HACDFPGCAKRFSRSDELTRHRRIH]MIG2_YEAST -- [-------------------------] ---RP [FRCDTCHRGFHRLEHKKRHLRTH] TGEKP [HHCAFPGCGKSFSRSDELKRHMRTH] [ ] :* [. * * * * * :* . *:* *] ***:* [. * * : *:**** .** : *]7Benos 02-710/MSCBIO2070 8-FEB-2007 13Local alignment (Local alignment (cntdcntd))MijSTi-1ij-1 jMi,j = MAXMi-1, j-1 + Score(Si,Tj )Mi,j-1 + γMi-1,j + γDNA matrixPAMBLOSUMGap penalty0Smith & Waterman, 1981Benos 02-710/MSCBIO2070 8-FEB-2007 14The Smith-WatermanThe Smith-WatermanAlgorithmAlgorithm1. InitializationF(0,0) = F(0,j) = F(i,0) = 02. Iterationfor i=1,…,Mfor j=1,…,N- calculate optimal F(i,j)- store Ptr(i,j)3. Termination• Find the end of the best alignment with FOPT = max{i,j} F(i,j) andtrace back OR• Find all alignments with F(i,j) > threshold and trace back8Benos 02-710/MSCBIO2070 8-FEB-2007 15Local Local vs.vs. global alignment global alignmentBenos 02-710/MSCBIO2070 8-FEB-2007 16Local Local vs.vs. global alignment global alignment((cntdcntd))9Benos 02-710/MSCBIO2070 8-FEB-2007 17Local alignment (Local alignment (cntdcntd))Characteristics of local alignments:• The alignment can start/end at any point in the matrix.• No negative scores in the alignment.• The mean value of the scoring matrix (e.g. PAM,BLOSUM) should be negative, but there should bepositive scores in the scoring matrix.Benos 02-710/MSCBIO2070 8-FEB-2007 18DNA and protein databasesDNA and protein databases• EMBL/GenBank/DDBJ database of nucleic acids10Benos 02-710/MSCBIO2070 8-FEB-2007 19DNA and protein databasesDNA and protein databases• EMBL/GenBank/DDBJ database of nucleic acids (cntd)Benos 02-710/MSCBIO2070 8-FEB-2007 20DNA and protein databasesDNA and protein databases• SWISS-PROT & TrEMBL database of proteins11Benos 02-710/MSCBIO2070 8-FEB-2007 21DNA and protein databasesDNA and protein databases• SWISS-PROT & TrEMBL database of proteins Benos 02-710/MSCBIO2070 8-FEB-2007 22Database searchesDatabase searches• Database searching consists of many pairwisealignments combined in one search.• It helps determining the function and the evolutionaryrelationships• Heuristic algorithms are used instead of DP. Why?• Size of SWISS-PROT + TrEMBL (Rel. 9.5):3.9M entries or 1,276M residues.• Exact algorithms are O(NM) fast.• Heuristic methods can look at a small fraction of thesearching space that will include all (or most) of thehigh scoring pairs.12Benos 02-710/MSCBIO2070 8-FEB-2007 23BLAST algorithmBLAST algorithm• Basic Local Alignment Search Tool - The method:• For each “word” (of fixed-length) in the query sequence,make a list of all neighbouring “words” that score abovesome threshold.• Scan the database for these words.• Perform


View Full Document

CMU CS 10810 - equence Analysis & Evolution (part II)

Download equence Analysis & Evolution (part II)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view equence Analysis & Evolution (part II) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view equence Analysis & Evolution (part II) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?