DOC PREVIEW
Stanford CS 262 - Large-Scale Global Alignments Multiple Alignments

This preview shows page 1-2-15-16-17-32-33 out of 33 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Large-Scale Global Alignments Multiple AlignmentsARACHNE: Steps to Assemble a GenomeSlide 3Slide 4Slide 5Slide 6Slide 74. Derive Consensus SequenceSimulated Whole Genome ShotgunMaking a Simulated ReadHuman 22, Results of SimulationsNeurospora crassa Genome (Real Data)Mouse GenomeNext few lecturesRapid Global AlignmentsMotivationMain IdeaOutlineMethods to FIND Local AlignmentsFinding Local Alignments: Sorting k-long wordsSorting k-long words: exampleRunning timeSuffix TreesDefinition of a Suffix TreeConstructing a Suffix TreeNaïve Algorithm to Construct a Suffix TreeExample of Suffix Tree ConstructionFaster ConstructionMemory to Store Suffix TreeApplication: Find all Matches Between x and ySlide 31Application: Online Search of Strings on a DatabaseApplication: Common Substrings of k StringsLarge-Scale Global Alignments Multiple AlignmentsLecture 10, Thursday May 1, 2003Lecture 10, Thursday May 1, 2003ARACHNE: Steps to Assemble a Genome1. Find overlapping reads4. Derive consensus sequence..ACGATTACAATAGGTT..2. Merge good pairs of reads into longer contigs3. Link contigs to form supercontigsLecture 10, Thursday May 1, 20033. Link Contigs into SupercontigsToo dense: Overcollapsed?(Myers et al. 2000)Inconsistent links: Overcollapsed?Normal densityLecture 10, Thursday May 1, 2003Find all links between unique contigs3. Link Contigs into Supercontigs (cont’d)Connect contigs incrementally, if  2 linksLecture 10, Thursday May 1, 2003Fill gaps in supercontigs with paths of overcollapsed contigs3. Link Contigs into Supercontigs (cont’d)Lecture 10, Thursday May 1, 2003Define G = ( V, E )V := contigs E := ( A, B ) such that d( A, B ) < C Reason to do so: Efficiency; full shortest paths cannot be computed3. Link Contigs into Supercontigs (cont’d)d ( A, B )Contig AContig BLecture 10, Thursday May 1, 20033. Link Contigs into Supercontigs (cont’d)Contig AContig BDefine T: contigs linked to either A or BFill gap between A and B if there is a path in G passing only from contigs in TLecture 10, Thursday May 1, 20034. Derive Consensus SequenceDerive multiple alignment from pairwise read alignmentsTAGATTACACAGATTACTGA TTGATGGCGTAA CTATAGATTACACAGATTACTGACTTGATGGCGTAAACTATAG TTACACAGATTATTGACTTCATGGCGTAA CTATAGATTACACAGATTACTGACTTGATGGCGTAA CTATAGATTACACAGATTACTGACTTGATGGGGTAA CTATAGATTACACAGATTACTGACTTGATGGCGTAA CTADerive each consensus base by weighted votingLecture 10, Thursday May 1, 2003Simulated Whole Genome Shotgun•Known genomesFlu, yeast, fly, Human chromosomes 21, 22•Make “realistic” shotgun reads •Run ARACHNE•Align output with genome and compareLecture 10, Thursday May 1, 2003Making a Simulated ReadSimulated reads have error patterns taken from random real readsERRORIZERSimulated readartificial shotgun readreal readLecture 10, Thursday May 1, 2003Human 22, Results of SimulationsPlasmid/ Cosmid cov10 X / 0.5 X 5 X / 0.5 X 3 X/ 0 XN50 contig 353 Kb 15 Kb 2.7 KbMean contig 142 Kb 10.6 Kb 2.0 KbN50 scaffold 3 Mb 3 Mb 4.1 KbAvg base qual 41 32 26% > 2 kb 97.3 91.1 67Lecture 10, Thursday May 1, 2003Neurospora crassa Genome (Real Data)• 40 Mb genome, shotgun sequencing complete (WI-CGR)Coverage:1705 contigs368 supercontigs• 1% uncovered (of finished BACs)• Evaluated assembly using 1.5Mb of finished BACsEfficiency:Time: 20 hrMemory: 9 GbAccuracy:< 3 misassemblies compared with 1 Gb of finished sequenceErrors/106 letters:Subst. 260Indel: 164Lecture 10, Thursday May 1, 2003Mouse GenomeImproved version of ARACHNE assembled the mouse genomeSeveral heuristics of iteratively:Breaking supercontigs that are suspiciousRejoining supercontigsSize of problem: 32,000,000 readsTime: 15 days, 1 processorMemory: 28 GbN50 Contig size: 16.3 Kb  24.8 Kb N50 Supercontig size: .265 Mb  16.9 MbLecture 10, Thursday May 1, 2003Next few lecturesMore on alignmentsLarge-scale global alignment – Comparing entire genomesSuffix trees, sparse dynamic programmingMumMer, Avid, LAGAN, Shuffle-LAGANMultiple alignment – Comparing proteins, many genomesScoring, Multidimensional-DP, Center-Star, Progressive alignmentCLUSTALW, TCOFFEE, MLAGANGene recognition Gene recognition on a single genomeGENSCAN – A HMM for gene recognitionCross-species comparison-based gene recognitionTWINSCAN – A HMMSLAM – A pair-HMMRapid Global AlignmentsHow to align genomic sequences in (more or less!) linear timeLecture 10, Thursday May 1, 2003Motivation•Genomic sequences are very long:–Human genome = 3 x 109 –long–Mouse genome = 2.7 x 109 –long•Aligning genomic regions is useful for revealing common gene structure–Useful to compare regions > 1,000,000-longLecture 10, Thursday May 1, 2003Main IdeaGenomic regions of interest contain ordered islands of similarity–E.g. genes 1. Find local alignments2. Chain an optimal subset of themLecture 10, Thursday May 1, 2003Outline•Methods to FIND Local Alignments–Sorting k-long words–Suffix Trees•Methods to CHAIN Local Alignments–Dynamic Programming–Sparse Dynamic ProgrammingMethods to FIND Local Alignments1. Sorting K-long wordsBLAST, BLAT, and the like2. Suffix TreesLecture 10, Thursday May 1, 2003Finding Local Alignments: Sorting k-long wordsGiven sequences x, y:1. Write down all (w, 0, i): w = xi+1…xi+k(z, 1, j): z = yj+1…yj+k2. Sort them lexicographically3. Deduce all k-long matches between x and y4. Extend to local alignmentsLecture 10, Thursday May 1, 2003Sorting k-long words: exampleLet x, y be matched with 3-long words:x = caggc: (cag,0,0), (agg,0,1), (ggc,0,2)y = ggcag: (ggc,1,0), (gca,1,1), (cag,1,2)Sorted: (agg,0,1),(cag,0,0),(cag,1,2),(ggc,0,2),(ggc,1,0),(gca,1,1)Matches:1. cag: x1x2x3 = y3y4y52. ggc: x3x4x5 = y1y2y3Lecture 10, Thursday May 1, 2003Running time•Worst case: O(NxM)•In practice: a large value of k results in a short list of matchesTradeoff: Low k: worse running timeHigh k: significant alignments missedPatternHunter:Sampling non-consecutive positions increases the likelihood to detect a conserved region, for a fixed value of k – refer to Lecture 3Lecture 10, Thursday May 1, 2003Suffix Trees•Suffix trees are a method to find all maximal matches between two strings (and much more)Example: x = dabdacd a b d a ccabdaccccadb142563Lecture 10, Thursday May 1, 2003Definition of a Suffix TreeDefinition:For string x = x1…xm, a suffix tree is:–A rooted tree with m leavesLeaf i: xi…xm–Each edge is a substring–No two edges out of a node, start with same letterIt follows, every


View Full Document

Stanford CS 262 - Large-Scale Global Alignments Multiple Alignments

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Large-Scale Global Alignments Multiple Alignments
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Large-Scale Global Alignments Multiple Alignments and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Large-Scale Global Alignments Multiple Alignments 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?