UMD CMSC 838T - Solving the Protein Threading Problem in Parallel

Unformatted text preview:

Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen AndonovMotivationTalk OverviewTechniquesPowerPoint PresentationReduction to Network Flow: An ExampleReduction to Network Flow: Variables and ConstraintsDrawbacks of ApproachParallel SolutionImproving Parallel SolutionEvaluationResult TablesSlide 13Slide 14Slide 15Related WorkObservationsSlide 18Slide 19Solving the Protein Threading Problem in ParallelNocola Yanev, Rumen AndonovIndrajit BhattacharyaCMSC 838T PresentationCMSC 838T – PresentationMotivationProblem paper is trying to solve3D structure prediction using threadingIs a given target sequence likely to fold to a 3D template core?Find the alignment that minimizes some score functionNP-complete; optimal solution not possibleMAX-SNP-hard; arbitrary approximation not possibleWhy do we care3D structure determines biological function of proteinAmino acid sequence (almost) uniquely determines 3D structureThreading is usually less accurate than comparative modeling but easier to solveCMSC 838T – PresentationTalk OverviewOverview of talkMotivationTechniquesEvaluationRelated workObservationsCMSC 838T – PresentationTechniquesApproachReduce the problem to some known theoretical problem of interestIn this case, network flowUse existing tools for solving the theoretical problem efficientlyCPLEXExplore possibilities for parallelizing the problemInvestigate the intrinsic hardness for real biological examplesCMSC 838T – PresentationTwo structurally similar proteins Spatial adjacencies (interactions) Possible threading with a sequence Objective functionMathematical FormulationCMSC 838T – PresentationReduction to Network Flow: An ExampleCMSC 838T – PresentationReduction to Network Flow:Variables and ConstraintsStandard Network FlowVariable xi,t for each segment to position assignmentRestricted to [0, 1]With standard flow conservation constraintsAdditional cost for non-local interactionsVariable zi,t,i’,t’ for each non-local interactionRestricted to {0, 1}Constrained to sum to 1 for each non-local pair (i, i’)Upper bounded by flow entering (i, t) and leaving (i’, t’)CMSC 838T – PresentationDrawbacks of ApproachInteger programming is hard to solve!Relax to linear programming with (0, 1) variablesApproximate to integer solution using standard heuristicsExisting tools like CPLEXHuge number of variablesFor 36 segments and 81 positions, IP problem has 741264 rows, 360945 columns and 54145231 non-zero variables!Need to reduce number of variables and constraintsCalls for parallelization if possibleCMSC 838T – PresentationParallel SolutionUtilize special flow constraints Split into sub-problems that may be solved parallelySplit the k-th layer in the graph into r intervalsForce path for a sub-problem to pass through a particular interval in the layerPass best bound for objective function found so far as parameter to sub-problemSub-task aborts when dual objective function exceeds the current best boundCMSC 838T – PresentationImproving Parallel SolutionDrawback: Hardest Sub-Problem Dominates! Parallel strategy was found to be slower than the sequential!Sub-problems can potentially become harder to solveMany more difficult sub-problems than easy onesSolution: Break the atomicity of the tasksEach sub-task periodically checks the current best bound and updates its cut-offExtra overhead is still small compared to task granularityNow the easiest executing sub-task dominates!CMSC 838T – PresentationEvaluationExperimental environmentReal protein sequencesILOG CPLEX Callable LibrarySUN Ultra-Sparc II, 450 MhzObjective function coefficients generated from FROSTMaximum of 7 processors and 29 sub-problemsEvaluation resultsSequential version much faster than previous branch-and-bound results for the same problem formulationTime taken comparable to PROSPECTSplitting and parallelization significantly improve turnaroundReally tiny gap between relaxed LP and ILP solutionsMostly integer solutions even for relaxed LP!CMSC 838T – PresentationResult TablesComparison with branch and bound algorithmComment: Self threading results in significantly lower scores (as should be)CMSC 838T – PresentationResult TablesGap between relaxed LP and ILPComment: Tiny relaxation gap. (significance?)CMSC 838T – PresentationResult TablesSize of the LP formulationComment: LP problem size is still too large.CMSC 838T – PresentationResult TablesPerformance with parallel sub-tasksComment: Longer times with more sub-problems??CMSC 838T – PresentationRelated WorkSimilar / previous approachesLathrop and Smith, 1998Uses same cost functionBranch and bound algorithm for searching the space of threadings Xu, Xu and Uberbacher, 1998Divide and conquer algorithmXu, Li, Lin, Kim and Xu, 2003Linear programming formulationSolved using b&b algorithmNone of the above suggest any parallelizing schemeCMSC 838T – PresentationObservationsPoints of InterestMapping to a known problem of interestNicely utilizes particular constraints to break into independent subtasksThreading of real amino acid sequences seems possibleRaises interesting questions about real-life protein threading being in PSolver tailored for this particular problem may yield better resultsCMSC 838T – PresentationObservationsCriticismNot enough experiments with large number of subtasks and processors to show scalingProhibitively large number of variables and constraintsHow accurate are the objective function coefficients?What is the resolution of the objective function?Threading onto multiple sequences for prediction still looks dauntingNot clear how to extend the idea for 3-way and more complex interactionsImprovementsSeems possible to break up the sub-tasks recursivelyCMSC 838T – PresentationThank


View Full Document

UMD CMSC 838T - Solving the Protein Threading Problem in Parallel

Documents in this Course
Load more
Download Solving the Protein Threading Problem in Parallel
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Solving the Protein Threading Problem in Parallel and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Solving the Protein Threading Problem in Parallel 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?