Finding the Beta Helix MotifPapersSecondary Structure sheetSecond and a half Structure-HelixSlide 7Slide 8Slide 9Slide 10Slide 11BetaWrapHydrophobic/chargedBetaWrap: RungsSlide 15BetaWrap: Multiple RungsSlide 17BetaWrap: CompletingTrainingBetaWrap: ResultsSlide 21BetaWrap: SummaryConditional Random Fields (CRFs)Hidden Markov ModelSlide 25Viterbi Algorithm HMMSlide 27HMM DisadvantagesSlide 29Slide 30Viterbi CRFsSlide 32Segmented CRFsSlide 34Slide 35Beta-Helix CRFSlide 37Intra-Node FeaturesSlide 39Slide 40Inter-Node FeaturesSlide 42SCRF: ResultsSlide 44SummaryQuestionsFinding the Beta Helix Motif By Marcin MejranPapersPredicting The -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke, Jonathan King, Bonnie BergerSegmentation Conditional Random Fields (SCRFs): A New Approach for Protein Fold Recognition by Yan Liu, Jaime Carbonell, Peter Weigele, and Vanathi GopalakrishnanSecondary StructureBeta Strand• Forms -sheetsAlpha Helix• Stand aloneCan combine into more complex structures:• Beta sheets• Beta HelixesImages from: http://www.people.virginia.edu/~rjh9u/prot2ndstruct.html sheetSecond and a half Structurebeta helixbeta barrelbeta trefoil-Helix-HelixHelix composed of three parallel sheetsThree -strands per “rung”Connecting “loops”Not in EukaryotesSecreted by various bacteriaRight and left handed-HelixFew solved structures9 SCOP SuperFamilies14 RH solved structures in PDB Solved structures differ widelyB3T2B2B1-HelixT2 turn: unique two residue loop-strands are 3 to 5 residues.T1 and T3 vary in size, may contain secondary structures-strands interact between rungs-HelixGood choice from computational point of view“Nice” structureRepeatingparallel -standsRungs have similar structureStacking is predictableWell conserved -stand across super-families-HelixLong term interactionsClose in 3D but not 1D“Non-unique” featuresB2-T2-B3 segmentUnique features not clearly shown in sequenceUsual methods don’t workImage from: http://www.cryst.bbk.ac.uk/PPS2/course/section10/all_beta.htmlBetaWrap“Wraps” sequences around helixFinds best “wrap”Uses B2, B3 strands and T2 turnRest of rung varies greatly in sizeDecomposes into sub-problemsRungsFind multiple rungsFind B1 by local optimizationHydrophobic/chargedHydrophobicDislikes WaterHydrophilicLike waterChargedOn OutsideB3T2B2B1Image from: http://betawrap.lcs.mit.edu/BetaTalk.pptBetaWrap: RungsGiven a T2 turn, find the next T2 turnB2B3T2CandidateRungImage from: http://betawrap.lcs.mit.edu/BetaTalk.pptBetaWrap: RungsMore weight given to inward pairsCertain stacked Amino Acids preferredPenalty for highly charged inward residuesPenalizes too few or too many residuesB3T2B2B1Image from: http://betawrap.lcs.mit.edu/BetaTalk.pptBetaWrap: Multiple RungsFind multiple initial B2-T2-B3 segmentsMatch pattern based on hydrophobic residues (appear on the inside)Φ – A,F,I,L,M,V,W,Y – D,E,R,KX - AnyAFDEMVRKYE FIFDDEAK EDEMVMVFDBetaWrap: Multiple RungsDP is used to find 5 rungs in either direction from initial positionsα-helix filteringTake average score of top 10 remaining wrapsImage from: http://betawrap.lcs.mit.edu/BetaTalk.pptBetaWrap: CompletingFind B1 positionsHighest scoring parseDoes not affect wrap score.Further filtering on hydrophobic residues in T1 and T2TrainingSeven fold cross-validationPartitioned based on familiesScores calculated forα-helix filtering thresholdB1-score thresholdHydrophobic count thresholddistribution of unmatched residues between rungsImage from: http://www.ornl.gov/info/ornlreview/v37_1_04/article_21.shtmlBetaWrap: ResultsBetaWrap: ResultsCorrectly identifies Beta-HelixesCorrectly separates helixes and non-helixesCan predict -helixes across familiesBetaWrap: SummaryPros:Finds beta-helixesAccurateCons:Still makes errorsRung placementHard coded informationOver-fittingHard to generalizeConditional Random Fields (CRFs)y1x1y2x2y3x3y4x4y5x5y6x6…HMMy1x1y2x2y3x3y4x4y5x5y6x6…CRFHidden Markov ModelSet of StatesTransition ProbabilitiesEmission ProbabilitiesOnly given sequence of emitted residuesFind sequence of true statesGenerativeRes ProbA .2B .8Res ProbA .2B .8Res ProbA .2B .8Hidden Markov ModelHMM: MaximizeP(x,y|θ) = P(y|x,θ)P(x|θ)x: emitted state/given sequencey: “hidden”/true stateP(x,y|θ): Joint probability of x and yP(y|x,θ): Probability of y given xP(x|θ): Probability of xNeed to make assumptions about the distribution of xViterbi Algorithm HMMFind most likely path/most likely sequence of hidden statese3(x1)e2(x1)e1(x1)e3(x2)e2(x2)e1(x2)e3(x3)e2(x3)e1(x3)e3(x4)e2(x4)e1(x4)x1x2x3x4Viterbi Algorithm HMMe3(x1)e2(x1)e1(x1)e3(x2)e2(x2)e1(x2)e3(x3)e2(x3)e1(x3)e3(x4)e2(x4)e1(x4)x1x2x3x4v(i,j) = max(v(i-1,1)*t1,j*ej(xi), v(i-1,2)*t2,j*ej(xi) … v(i-k,1)*tk,j*ej(xi))HMM DisadvantagesThere is a strong independence assumptionLong term interactions are difficult to modelOverlapping features are difficult to modelConditional Random Fields (CRFs)Replace transition and emission probabilities with a set of feature functions f(i,j,k)Feature functions based on all xs, not just oneNot generativef(3,0,1)f(2,0,1)f(1,0,1)f(3,i,2)f(2,i,2)f(1,i,2)f(3,i,3)f(2,i,3)f(1,i,3)f(3,i,4)f(2,i,4)f(1,i,4)x1x2x3x4Conditional Random Fields (CRFs)HMM: MaximizeP(x,y|θ)=P(y|x,θ)P(x|θ)CRF: MaximizeP(y|x,θ)Do not make assumptions about underlying distributionViterbi CRFsSame method as for HMMf(3,0,1)f(2,0,1)f(1,0,1)f(3,i,2)f(2,i,2)f(1,i,2)f(3,i,3)f(2,i,3)f(1,i,3)f(3,i,4)f(2,i,4)f(1,i,4)x1x2x3x4Conditional Random Fields (CRFs)States should form a chainLikelihood function is convex for chainZ0 = number of statesλk = weightsSegmented CRFsEach state corresponds to a structureRepresented as a graph GStates represent secondary structuresNodes represent interactionsChains are nicer than graphsSegmented CRFsG =<V,E1,E2>E1: Edges between neighborsE2: Edges for long-term interactionsE1 edges can be implied in modelOnly E2 needs to be explicitly consideredHoweverGraph needs to be a chain for E2Deterministic state transitionsBeta-Helix CRFBeta-Helix CRFCombined statesB23: B2,B3,T2Size
View Full Document