Graphical Models for Protein KineticsOutlineBackground on ProteinsStructure PredictionPathways and KineticsFolding KineticsApplicationsRepresentation of a ProteinSlide 9Robotics Motion PlanningRoadmap MethodProtein Folding as a Search ProblemStochastic Roadmap Simulation (Apaydin et. al. 2003)Roadmap ConstructionRoadmap as a Markov ChainTransmission CoefficientsAlgebraic Method for Calculating Transmission CoefficientsTransmission Coefficients (cont)ResultsBenefits and DrawbacksMolecular dynamicsFolding@Home: Worldwide desktop grid computingMarkovian Model Method (Singhal et al. JCP 2004)Step 1: sampling of pathsStep 2: Generation of roadmapStep 3 (opt): Re-weighting of edgesCalculating Pfolds and MFPTEnergy landscape and initial pathwayResults - PfoldResults - MFPTResults – Trp zipper b-hairpinConclusionsGraphical Models for Protein KineticsNina SinghalCS374 PresentationNov. 1, 2005OutlineBackground material on proteinsWhy study protein kineticsGraphical models for kineticsMotion planning view (Apaydin et al, 2003)Molecular dynamics view (Singhal et al, 2004)ConclusionsBackground on ProteinsAlpha HelixBeta Strand and SheetBeta BarrelStructure PredictionGiven an amino acid sequence, what 3D structure will the protein form?MTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE ?Pathways and KineticsHow does a protein actually get from an unfolded configuration to a folded configuration??Folding KineticsRate of foldingUniqueness of pathwayOrder of secondary structure formationSecondary or tertiary structureApplicationsMisfolded proteins and diseasesAlzheimer'sCystic fibrosisMad cow diseaseIntermediates may be important as drug targetsProtein designRepresentation of a ProteinN1CCN2phipsiomegaRPsi A protein with n amino acids can be represented using 2n phi-psi angles, each in the range [0, 2)N1N2Graphical Models for Protein KineticsProtein conformations have different energies Graphical models discretize the conformation space and connect nearby regions with edgesRobotics Motion Planningc1c2Robot with 2 degrees of freedom022c1c22D configuration spaceMoving the robot arm from c1 to c2 is just finding a path in the configuration space from c1 to c2.Roadmap MethodRandomly sample points in configuration space. Keep feasible ones.Connect these points to form a graph.Process path queries using standard graph search techniques.022c2c1Protein Folding as a Search ProblemProtein folding can be represented as a search through the protein’s configuration spaceReplace collision free constraint with a preference for low energy configurationsInstead of finding any path, want to find all the energetically favorable pathsStochastic Roadmap Simulation (Apaydin et. al. 2003)Sample protein configuration at randomAdd edges between nearby nodesTake advantage of the many folding pathways contained within a roadmapEfficiently calculate many properties of the entire landscapeRoadmap ConstructionNodes in the graph are sampled uniformly at random Edges are added between nearest neighbors with probability:ikTEiijNeNPij11if Eij > 0otherwiseijijiiPP 1Roadmap as a Markov ChainWe can view the molecular motion as a random walk over the roadmapRoadmap can be regarded as discretely sampled version of Monte Carlo simulationIf fact, in the limit, probability distributions of Monte Carlo simulation and the roadmap convergeTransmission CoefficientsMeasures “kinetic distance” Probability that a conformation will fold before unfoldingCan calculate by starting many Monte Carlo simulations from the conformation Very computationally expensiveFolded stateUnfolded state??Algebraic Method for Calculating Transmission CoefficientsFUvivjPij UvFvjijFv Uvijijijjj jPPP,01Transmission Coefficients (cont)System of linear equationsOne equation and one unknown for each nodeCan be solved iterativelyLow connectivity of the graph results in a sparse matrix jvtjijtiP)()1(ResultsStudied a synthetic landscape and a real protein, ROPProtein was represented with 6 degrees of freedom, two vectors connected by a loopCorrelation of transmission coefficients calculated by roadmaps and Monte Carlo simulationsBenefits and DrawbacksExtremely efficient at calculating kinetic properties like transmission coefficientsUnclear whether low-dimension representation of protein is adequateMonte Carlo simulations may not be accurate enough for protein kineticsMolecular dynamics10-15femto10-12pico10-9nano10-6micro10-3milli100secondsBond vibrationIsomer-ationWaterdynamicsHelixformsFastestfolderstypicalfoldersslowfolderslong MD runwhere weneed to beMDstepwhere we’dlove to beSimulate protein movement using Newton’s laws of motionFolding@Home:Worldwide desktop grid computing~150,000 CPUs over the world (CPU locations from IP address)Markovian Model Method(Singhal et al. JCP 2004)Generate molecular dynamics trajectories from transition path sampling or independentlyCluster nearby points into macrostates to build roadmap with also include transition timeCalculate the mean first passage time and Pfold using linear algebraStep 1: sampling of pathsPick a random point from current pathShoot a path from this pointIf path reaches initial or final state by some cutoff time, stop simulation and accept itDefine new current path I F n0 n1 n2 n3 n4 n5 n6 np0 np1 np2 np0 np1 np2Step 2: Generation of roadmapNodes are accepted points, edges connect successive nodesCluster nearby points to make roadmap more connectedCalculate edge weights by counting number of transitions between nodes and normalize P12, t12 P14, t14 n1 n2 n3 n4 n5 P23, t23 P45, t45 P12t12 + P14t14 t’12 = P12 + P14 n1 n2 n3 n5 P’23 = P23, t’23 = t23 P’25 = P45, t’25 = t45 P’12 = P12 + P14Step 3 (opt): Re-weighting of edgesCan analyze roadmap at parameter values other than the simulated ones without need for additional simulationsFor temperature, can re-weight edges by the relative probabilities at the two temperatures according to the dynamicsRenormalize edges so outgoing probability sums to oneCalculating Pfolds and MFPTEquation for each node is conditioned on which neighbor it transitions toOne equation and one unknown for each nodeCan be solved iterativelyijedgetjijtiPfoldPPfold)()1(1,0 FIedgejijiPfoldPfoldPfoldPPfoldij0)(FedgejijijiMFPTMFPTtimePMFPTijijedgetjijijtiMFPTt imePMFPT
View Full Document