DOC PREVIEW
Stanford CS 262 - DNA Sequencing

This preview shows page 1-2-15-16-31-32 out of 32 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

DNA SequencingThe Walking MethodSlide 3Advantages & Disadvantages of Hierarchical SequencingWalking off a Single SeedSlide 6Walking off several seeds in parallelSlide 8Whole-Genome Shotgun SequencingWhole Genome Shotgun SequencingARACHNE: Steps to Assemble a Genome1. Find Overlapping ReadsSlide 13Slide 141. Find Overlapping Reads (cont’d)2. Merge Reads into ContigsRepeats, errors, and contig lengthsSlide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 254. Derive Consensus SequenceSimulated Whole Genome ShotgunMaking a Simulated ReadHuman 22, Results of SimulationsNeurospora crassa Genome (Real Data)Mouse GenomeNext few lecturesDNA SequencingThe Walking Method1. Build a very redundant library of BACs with sequenced clone-ends (cheap to build)2. Sequence some “seed” clones3. “Walk” from seeds using clone-ends to pick library clones that extend left & rightWalking: An ExampleAdvantages & Disadvantages of Hierarchical SequencingHierarchical SequencingADV. Easy assemblyDIS. Build library & physical map; redundant sequencingWhole Genome Shotgun (WGS)ADV. No mapping, no redundant sequencingDIS. Difficult to assemble and resolve repeatsThe Walking method – motivationSequence the genome clone-by-clone without a physical mapThe only costs involved are:Library of end-sequenced clones (cheap)SequencingWalking off a Single Seed•Low redundant sequencing•Many sequential stepsWalking off a single clone is impractical Cycle time to process one clone: 1-2 months1. Grow clone2. Prepare & Shear DNA3. Prepare shotgun library & perform shotgun4. Assemble in a computer5. Close remaining gapsA mammalian genome would need 15,000 walking steps !Walking off several seeds in parallel•Few sequential steps•Additional redundant sequencingIn general, can sequence a genome in ~5 walking steps, with <20% redundant sequencingEfficient InefficientUsing Two LibrariesSolution: Use a second library of small clonesMost inefficiency comes from closing a small ocean with a much larger cloneWhole-Genome Shotgun SequencingWhole Genome Shotgun Sequencingcut many times at randomgenomeforward-reverse paired readsplasmids (2 – 10 Kbp)cosmids (40 Kbp)known dist~500 bp~500 bpARACHNE: Steps to Assemble a Genome1. Find overlapping reads4. Derive consensus sequence..ACGATTACAATAGGTT..2. Merge good pairs of reads into longer contigs3. Link contigs to form supercontigs1. Find Overlapping Reads•Sort all k-mers in reads (k ~ 24)TAGATTACACAGATTACTAGATTACACAGATTAC|||||||||||||||||•Find pairs of reads sharing a k-mer•Extend to full alignment – throw away if not >95% similarT GATAGA| ||TACATAGT||1. Find Overlapping ReadsOne caveat: repeatsA k-mer that appears N times, initiates N2 comparisonsALU: 1,000,000 timesSolution:Discard all k-mers that appear more than c  Coverage, (c ~ 10)1. Find Overlapping ReadsCreate local multiple alignments from the overlapping readsTAGATTACACAGATTACTGATAGATTACACAGATTACTGATAG TTACACAGATTATTGATAGATTACACAGATTACTGATAGATTACACAGATTACTGATAGATTACACAGATTACTGATAG TTACACAGATTATTGATAGATTACACAGATTACTGA1. Find Overlapping Reads (cont’d)•Correct errors using multiple alignmentTAGATTACACAGATTACTGATAGATTACACAGATTACTGATAG TTACACAGATTATTGATAGATTACACAGATTACTGATAGATTACACAGATTACTGAC: 20C: 35T: 30C: 35C: 40C: 20C: 35C: 0C: 35C: 40•Score alignments•Accept alignments with good scoresA: 15A: 25A: 40A: 25- A: 15A: 25A: 40A: 25A: 02. Merge Reads into ContigsMerge reads up to potential repeat boundariesrepeat regionRepeats, errors, and contig lengths•Repeats shorter than read length are OK•Repeats with more base pair diffs than sequencing error rate are OK•To make a smaller portion of the genome appear repetitive, try to:Increase read lengthDecrease sequencing error rateRole of error correction:Discards ~90% of single-letter sequencing errorsdecreases error rate  decreases effective repeat content  increases contig length2. Merge Reads into Contigs•Ignore non-maximal reads•Merge only maximal reads into contigsrepeat region2. Merge Reads into Contigs•Ignore “hanging” reads, when detecting repeat boundariessequencing errorrepeat boundary???ba?????Unambiguous•Insert non-maximal reads whenever unambiguous2. Merge Reads into Contigs3. Link Contigs into SupercontigsToo dense: Overcollapsed?(Myers et al. 2000)Inconsistent links: Overcollapsed?Normal densityFind all links between unique contigs3. Link Contigs into SupercontigsConnect contigs incrementally, if  2 linksFill gaps in supercontigs with paths of overcollapsed contigs3. Link Contigs into SupercontigsDefine G = ( V, E )V := contigs E := ( A, B ) such that d( A, B ) < C Reason to do so: Efficiency; full shortest paths cannot be computed3. Link Contigs into Supercontigsd ( A, B )Contig AContig B3. Link Contigs into SupercontigsContig AContig BDefine T: contigs linked to either A or BFill gap between A and B if there is a path in G passing only from contigs in T4. Derive Consensus SequenceDerive multiple alignment from pairwise read alignmentsTAGATTACACAGATTACTGA TTGATGGCGTAA CTATAGATTACACAGATTACTGACTTGATGGCGTAAACTATAG TTACACAGATTATTGACTTCATGGCGTAA CTATAGATTACACAGATTACTGACTTGATGGCGTAA CTATAGATTACACAGATTACTGACTTGATGGGGTAA CTATAGATTACACAGATTACTGACTTGATGGCGTAA CTADerive each consensus base by weighted votingSimulated Whole Genome Shotgun•Known genomesFlu, yeast, fly, Human chromosomes 21, 22•Make “realistic” shotgun reads •Run ARACHNE•Align output with genome and compareMaking a Simulated ReadSimulated reads have error patterns taken from random real readsERRORIZERSimulated readartificial shotgun readreal readHuman 22, Results of SimulationsPlasmid/ Cosmid cov10 X / 0.5 X 5 X / 0.5 X 3 X/ 0 XN50 contig 353 Kb 15 Kb 2.7 KbMean contig 142 Kb 10.6 Kb 2.0 KbN50 scaffold 3 Mb 3 Mb 4.1 KbAvg base qual41 32 26% > 2 kb 97.3 91.1 67Neurospora crassa Genome (Real Data)• 40 Mb genome, shotgun sequencing complete (WI-CGR)Coverage:1705 contigs368 supercontigs• 1% uncovered (of finished BACs)• Evaluated assembly using 1.5Mb of finished BACsEfficiency:Time: 20 hrMemory: 9 GbAccuracy:< 3 misassemblies compared with 1 Gb of finished sequenceErrors/106 letters:Subst. 260Indel: 164Mouse GenomeImproved version of ARACHNE assembled the mouse genomeSeveral heuristics of iteratively:Breaking supercontigs that are suspiciousRejoining supercontigsSize of problem: 32,000,000 readsTime: 15 days, 1 processorMemory: 28 GbN50 Contig size: 16.3 Kb  24.8 Kb


View Full Document

Stanford CS 262 - DNA Sequencing

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download DNA Sequencing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view DNA Sequencing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view DNA Sequencing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?