DOC PREVIEW
UMD CMSC 838T - Whole Genome Alignment using Multithreaded Parallel Implementation

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Whole Genome Alignment using Multithreaded Parallel ImplementationTalk OverviewMotivationSolution..Pairwise Sequence Comparison using Dynamic ProgrammingDynamic ProgrammingContd….Identifying alignmentsEARTH Execution ModelEARTH ArchitectureMultithreaded parallel implementationComputation of similarity matrix on EARTHEvaluationSlide 14Result GraphsSlide 16ConclusionsRelated work –MUMmer(Maximal Unique Match)Whole Genome Alignment using Multithreaded Parallel ImplementationHyma S MurthyCMSC 838 PresentationCMSC 838T – PresentationTalk OverviewOrganization of the paperMotivationTechnique: Pairwise Sequence Comparison using Dynamic ProgrammingEARTH Execution ModelEvaluationResult GraphsConclusionsRelated Work (MUMmer)CMSC 838T – PresentationMotivationImportance of Genome Alignment :Identify important matched and mismatched regions“matches” represent homolog pairs, conserved regions or long repeats“mismatches”represent foreign fragments inserted by transposition, sequence reversal or lateral transferDetect functional differences between pathogenic/ non-pathogenic strains, evolutionary distance, mutations leading to disease, phenotypes, etc.ProblemsLarge computational power, memory and execution timeExisting algorithms apply dynamic programming only to subsequencesComputationally intensive to apply to whole sequences (O(n2))Thus applicable only to closely related genomesCMSC 838T – PresentationSolution..Multithreaded parallel implementation of sequence alignment algorithm to align whole genomesParallel implementation of dynamic programming techniqueUses collective memory of several nodesUses multithreading to overlap computation and communicationApplicable to closely related as well as less similar genomesReliable output in reasonable timeCMSC 838T – PresentationPairwise Sequence Comparison using Dynamic ProgrammingBasic Idea:Quantify the similarity between pairs of symbols of target sequencesAssociate score for each possible arrangementSimilarity is given by the highest scoreExample :sequence x A T A A G Tsequence y A T G C A G TSCORE 1 1 -1 –1 –1 –1 –1 TOTAL = -3sequence x A T A - A G Tsequence y A T G C A G TSCORE 1 1 -1 –2 1 1 1 TOTAL = 2Model mutation by “gaps” (gaps indicate evolution of one sequence into another)CMSC 838T – PresentationDynamic ProgrammingSmith and Waterman approach:Aligns subsequences of given sequencesInvolves: (a) calculation of scores indicating similarity (b) identification of alignment(s) corresponding to the scoreBuild solution using previous solutions for smaller subsequencesConstruct a two-dimensional array – “Similarity Matrix” to store scores corresponding to partial resultsMatrix represents all possible alignments of the input sequences Recurrence equationSM[i, j-1] + gpSM[i-1, j-1] + ssSM[i-1, j] + gp 0SM[i, j] =CMSC 838T – PresentationContd….Each element of the matrix is the max of the foll four values:Left element + gap, upper-left element + score of replacing verticalwith horizontal symbol, upper element + gap, 0.Consider the foll example T G A T G G A G G T 0 0 0 0 0 0 0 0 0 0 00 0 1 00 0 0 20000ATAGGG2 = max{0 + (-2), 1 + (1), 0 + (-2), 0}CMSC 838T – PresentationIdentifying alignmentsAlignments with score above a given threshold are reportedStart at end of the alignment and move backwards to the beginningT G A T G G A G G T 0 0 0 0 0 0 0 0 0 0 00 0 1 0 0 1 1 0 1 1 00 0 0 2 0 0 0 2 0 0 00 1 0 0 3 1 0 0 1 0 10 0 0 1 1 2 0 1 0 0 00 0 1 0 0 2 3 1 2 1 00 0 1 0 0 1 3 2 2 3 1ATAGGGT G A T – G G A G G T G A T A G GT G A T G G A G G T G A T A G GT G A T G G A G G T G A T A G GT G A T G G A G G T G A T A G GCMSC 838T – PresentationEARTH Execution ModelProgram is viewed as a collection of threadsexecution order determined by data and control dependenciesThreads further divided into fibers fibers are non-preemptive and all data is ready before their executionEach node in EARTH has an execution unit synchronization unitqueues linking the two (RQ and EQ)local memory interface to interconnection networkCMSC 838T – PresentationEARTH ArchitectureMemory busFrom RQTo EQPE PE PE…Local MemoryEUSUEQRQnodenodenode..Interconnection NetworkCMSC 838T – PresentationMultithreaded parallel implementationDivide scoring matrix as follows horizontal strips (each element of input sequence X) strips into rectangular blocksBlocks are calculated by two fibers within a thread only one fiber is active at any given timeEach thread is assigned to one horizontal strip the computation is done by even/ odd fibers within the threadInitialization delay of reading sequences from server is minimizedEach thread needs only the piece of input sequence it grabs and not the whole of sequence XAfter computing a block, fiber sends to fiber beneath a piece of sequence Yamong other informationThe computation of the anti-diagonal elements of the matrix is as shownCMSC 838T – PresentationComputation of similarity matrix on EARTHE fibers OE fibers OThread A Thread BP1P2P3Inactive fiberActive fiberAckSyncDataP1P2P3P4P1P2P3P4CMSC 838T – PresentationEvaluationExperimental environmentBeowulf implementation of EARTHUses Beowulf machine consisting of 64 nodes, each containing two 200MHz Pentium Pro processors (a total of 128 processors and 128MB of memory)Sequences of lengths ranging from 30K to 900K were testedExecution times for sequential and parallel implementation of Smith and Waterman algorithm is given below:Implementation TimeSeq. Smith-Waterman 53 hoursATGC on 16 nodes 3.3 hoursATGC on 32 nodes 2.1 hoursATGC on 64 nodes 1.3 hoursCMSC 838T – PresentationEvaluationThe multithreaded parallel implementation is named ATGC – Another Tool for Genomic ComparisonExperiment alignes human and mice mitochondrial genomeshuman and drosophila mitochondrial genomesReason for selectionhuman and mice are closely related and the other pair are less similarThe results were confirmed with MUMmer – another whole genome alignment toolResult graphs show that ATGC is more accurate than MUMmer(verified by using NCBI Blast)CMSC 838T – PresentationResult


View Full Document

UMD CMSC 838T - Whole Genome Alignment using Multithreaded Parallel Implementation

Documents in this Course
Load more
Download Whole Genome Alignment using Multithreaded Parallel Implementation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Whole Genome Alignment using Multithreaded Parallel Implementation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Whole Genome Alignment using Multithreaded Parallel Implementation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?