Stanford CS 374 - Finding the Beta Helix - D2923335

Home> Schools> Stanford University> Computer Science (CS) > CS 374> Finding the Beta Helix

DOC PREVIEW

Stanford CS 374 - Finding the Beta Helix

School name Stanford University

Course Cs 374- Algorithms in Biology

Pages 46

This preview shows page 1-2-3-22-23-24-44-45-46 out of 46 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Finding the Beta Helix MotifPapersSecondary Structure sheetSecond and a half Structure-HelixSlide 7Slide 8Slide 9Slide 10Slide 11BetaWrapHydrophobic/chargedBetaWrap: RungsSlide 15BetaWrap: Multiple RungsSlide 17BetaWrap: CompletingTrainingBetaWrap: ResultsSlide 21BetaWrap: SummaryConditional Random Fields (CRFs)Hidden Markov ModelSlide 25Viterbi Algorithm HMMSlide 27HMM DisadvantagesSlide 29Slide 30Viterbi CRFsSlide 32Segmented CRFsSlide 34Slide 35Beta-Helix CRFSlide 37Intra-Node FeaturesSlide 39Slide 40Inter-Node FeaturesSlide 42SCRF: ResultsSlide 44SummaryQuestionsFinding the Beta Helix Motif By Marcin MejranPapersPredicting The -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke, Jonathan King, Bonnie BergerSegmentation Conditional Random Fields (SCRFs): A New Approach for Protein Fold Recognition by Yan Liu, Jaime Carbonell, Peter Weigele, and Vanathi GopalakrishnanSecondary StructureBeta Strand• Forms -sheetsAlpha Helix• Stand aloneCan combine into more complex structures:• Beta sheets• Beta HelixesImages from: http://www.people.virginia.edu/~rjh9u/prot2ndstruct.html sheetSecond and a half Structurebeta helixbeta barrelbeta trefoil-Helix-HelixHelix composed of three parallel  sheetsThree -strands per “rung”Connecting “loops”Not in EukaryotesSecreted by various bacteriaRight and left handed-HelixFew solved structures9 SCOP SuperFamilies14 RH solved structures in PDB Solved structures differ widelyB3T2B2B1-HelixT2 turn: unique two residue loop-strands are 3 to 5 residues.T1 and T3 vary in size, may contain secondary structures-strands interact between rungs-HelixGood choice from computational point of view“Nice” structureRepeatingparallel -standsRungs have similar structureStacking is predictableWell conserved -stand across super-families-HelixLong term interactionsClose in 3D but not 1D“Non-unique” featuresB2-T2-B3 segmentUnique features not clearly shown in sequenceUsual methods don’t workImage from: http://www.cryst.bbk.ac.uk/PPS2/course/section10/all_beta.htmlBetaWrap“Wraps” sequences around helixFinds best “wrap”Uses B2, B3 strands and T2 turnRest of rung varies greatly in sizeDecomposes into sub-problemsRungsFind multiple rungsFind B1 by local optimizationHydrophobic/chargedHydrophobicDislikes WaterHydrophilicLike waterChargedOn OutsideB3T2B2B1Image from: http://betawrap.lcs.mit.edu/BetaTalk.pptBetaWrap: RungsGiven a T2 turn, find the next T2 turnB2B3T2CandidateRungImage from: http://betawrap.lcs.mit.edu/BetaTalk.pptBetaWrap: RungsMore weight given to inward pairsCertain stacked Amino Acids preferredPenalty for highly charged inward residuesPenalizes too few or too many residuesB3T2B2B1Image from: http://betawrap.lcs.mit.edu/BetaTalk.pptBetaWrap: Multiple RungsFind multiple initial B2-T2-B3 segmentsMatch pattern based on hydrophobic residues (appear on the inside)Φ – A,F,I,L,M,V,W,Y – D,E,R,KX - AnyAFDEMVRKYE FIFDDEAK EDEMVMVFDBetaWrap: Multiple RungsDP is used to find 5 rungs in either direction from initial positionsα-helix filteringTake average score of top 10 remaining wrapsImage from: http://betawrap.lcs.mit.edu/BetaTalk.pptBetaWrap: CompletingFind B1 positionsHighest scoring parseDoes not affect wrap score.Further filtering on hydrophobic residues in T1 and T2TrainingSeven fold cross-validationPartitioned based on familiesScores calculated forα-helix filtering thresholdB1-score thresholdHydrophobic count thresholddistribution of unmatched residues between rungsImage from: http://www.ornl.gov/info/ornlreview/v37_1_04/article_21.shtmlBetaWrap: ResultsBetaWrap: ResultsCorrectly identifies Beta-HelixesCorrectly separates helixes and non-helixesCan predict -helixes across familiesBetaWrap: SummaryPros:Finds beta-helixesAccurateCons:Still makes errorsRung placementHard coded informationOver-fittingHard to generalizeConditional Random Fields (CRFs)y1x1y2x2y3x3y4x4y5x5y6x6…HMMy1x1y2x2y3x3y4x4y5x5y6x6…CRFHidden Markov ModelSet of StatesTransition ProbabilitiesEmission ProbabilitiesOnly given sequence of emitted residuesFind sequence of true statesGenerativeRes ProbA .2B .8Res ProbA .2B .8Res ProbA .2B .8Hidden Markov ModelHMM: MaximizeP(x,y|θ) = P(y|x,θ)P(x|θ)x: emitted state/given sequencey: “hidden”/true stateP(x,y|θ): Joint probability of x and yP(y|x,θ): Probability of y given xP(x|θ): Probability of xNeed to make assumptions about the distribution of xViterbi Algorithm HMMFind most likely path/most likely sequence of hidden statese3(x1)e2(x1)e1(x1)e3(x2)e2(x2)e1(x2)e3(x3)e2(x3)e1(x3)e3(x4)e2(x4)e1(x4)x1x2x3x4Viterbi Algorithm HMMe3(x1)e2(x1)e1(x1)e3(x2)e2(x2)e1(x2)e3(x3)e2(x3)e1(x3)e3(x4)e2(x4)e1(x4)x1x2x3x4v(i,j) = max(v(i-1,1)*t1,j*ej(xi), v(i-1,2)*t2,j*ej(xi) … v(i-k,1)*tk,j*ej(xi))HMM DisadvantagesThere is a strong independence assumptionLong term interactions are difficult to modelOverlapping features are difficult to modelConditional Random Fields (CRFs)Replace transition and emission probabilities with a set of feature functions f(i,j,k)Feature functions based on all xs, not just oneNot generativef(3,0,1)f(2,0,1)f(1,0,1)f(3,i,2)f(2,i,2)f(1,i,2)f(3,i,3)f(2,i,3)f(1,i,3)f(3,i,4)f(2,i,4)f(1,i,4)x1x2x3x4Conditional Random Fields (CRFs)HMM: MaximizeP(x,y|θ)=P(y|x,θ)P(x|θ)CRF: MaximizeP(y|x,θ)Do not make assumptions about underlying distributionViterbi CRFsSame method as for HMMf(3,0,1)f(2,0,1)f(1,0,1)f(3,i,2)f(2,i,2)f(1,i,2)f(3,i,3)f(2,i,3)f(1,i,3)f(3,i,4)f(2,i,4)f(1,i,4)x1x2x3x4Conditional Random Fields (CRFs)States should form a chainLikelihood function is convex for chainZ0 = number of statesλk = weightsSegmented CRFsEach state corresponds to a structureRepresented as a graph GStates represent secondary structuresNodes represent interactionsChains are nicer than graphsSegmented CRFsG =<V,E1,E2>E1: Edges between neighborsE2: Edges for long-term interactionsE1 edges can be implied in modelOnly E2 needs to be explicitly consideredHoweverGraph needs to be a chain for E2Deterministic state transitionsBeta-Helix CRFBeta-Helix CRFCombined statesB23: B2,B3,T2Size

View Full Document