Atomistic Protein Folding Simulations on the Submillisecond Timescale Using Worldwide Distributed ComputingOverviewMotivationChallengeMethodSimulation of DynamicsEnsemble DynamicsLimitationsDistributed ComputingFolding@HomeSlide 11Other@Home WorkEvaluationFolding RatesFolding & UnfoldingObservationsAtomistic Protein Folding Simulations on the Submillisecond Timescale Using Worldwide Distributed ComputingQing LuCMSC 838 PresentationCMSC 838T – PresentationOverviewOverview of talkMotivationChallengeMethodsEnsemble DynamicsFolding@HomeEvaluationObservationsCMSC 838T – PresentationMotivationAtomistic simulation of protein foldingunderstand dynamics of foldingreal-time folding in full atomic detaillarge-scale parallelization methodsBenefitsprotein folding & diseaseprotein self-assemble to functionproteins misfold diseasesnanotechnologynanomachinesself-assemble on the nanoscaleCMSC 838T – PresentationChallengeDifficultieslimited by current computational techniquesfastest folding in microsecondsone CPU: 1ns/day, 30 years10,000 fold computational gap1,000 CPUs, 1 microsecond / daytraditional parallelization schemehard to scale to a large amount of processorsextremely fast communicationcomplexity of coordinationexpensive supercomputerscosttime-sharingCMSC 838T – PresentationMethodensemble dynamicsa new simulation algorithmparallel simulationFolding@Home heterogeneous network, Internetlarge-scale distributed platformCMSC 838T – PresentationSimulation of Dynamicsfree energy barrierprogress from one state to another: transitionthermal fluctuations to push system over free energy barrierprevious approaches: samplingmaybe stuck in meta-stable free energy minimaexpensive computational cost of samplingCMSC 838T – PresentationEnsemble Dynamicsapplication scenariowaiting time of transitions dominates total timeprotein foldingtransition: free energy barrier crossingcoupled simulations: transition couplingAlgorithmM independent simulations from a initial conditionfirst simulation to cross free energy barrierM times less to cross barrier than average timerestart M simulations with the new location after transitionNear linear speed up in #processorsexponential kinetics: f(t) = 1 – exp(-k t)If k * t is small, f(t) = k * tM simulations M * f(t) = M * k * t folding eventsCMSC 838T – PresentationLimitationsbarrier crossing probabilityexponential assumptionscorrect transition detectiontransition: free energy barrier crossinga large variance in energy: thresholdcorrect detection is not guaranteedmultiple possible transitionnot addressedselection of the first transitionCMSC 838T – PresentationDistributed ComputingDistributed simulationsM processors for each runsimulate folding in atomic detail on each processorrestart once a crossing barrier event occursImplementation: Folding@Homeworldwide distributed computing: Internetstarted in October 2000more than 200,000 participants10,000 CPU-years in the first 12 monthsCMSC 838T – PresentationFolding@HomeCMSC 838T – PresentationFolding@Homeclient-server architectureserver assign jobs(work unit) to clientclient sends back results after computation~100K data transfer between client and serverwhy is ensemble dynamics good for Folding@Home?CPU intensive job: a few hours, often daysconnection speed: modem, good enoughsuitable for Folding@HomeCMSC 838T – PresentationOther@Home WorkSETI@Homesearch for intelligent life outside Earthdata analysis of signalsFightAids@Homefind drug therapy for HIVhow drugs interact with various HIV virus mutationsdistributed projectsDivide-and-ConquerCPU intensive jobssmall pieces of data(kilobytes) transfercommunication not a major concernCMSC 838T – PresentationEvaluationFolding@Homebased on Tinker molecular dynamics codevoluntary participants worldwide, over 400,000 CPUssimulate folding and unfoldingfolding ratessimulations on small proteinsCMSC 838T – PresentationFolding RatesCMSC 838T – PresentationFolding & UnfoldingCMSC 838T – PresentationObservationsSamplingtoo expensive to run for a long timescaleswaste too much time lingering in local energy minimaEnsemble dynamicsspeed up simulations of dynamicsbiological meaning of simulations results?results on large protein folding?limitations: correct transition detection, transition probabilityFolding@Homecheap way to achieve super computation powerhuge distributed computing platform: over 400,000 CPUsan efficient approach for CPU intensive jobComplexity of problems and size of data increase rapidlyfind better algorithm is preferable to buying
View Full Document