DOC PREVIEW
Stanford CS 374 - DNA Sequencing and Assembly

This preview shows page 1-2-3-20-21-22-41-42-43 out of 43 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

DNA Sequencing and AssemblyDNA sequencingWhich representative of the species?DNA sequencing – vectorsDifferent types of vectorsDNA sequencing – gel electrophoresisElectrophoresis diagramsOutput of gel electrophoresis: a readMethod to sequence segments longer than 500Slide 10Definition of CoverageSlide 12RepeatsWhat can we do about repeats?Slide 15Strategies for sequencing a whole genomeHierarchical SequencingHierarchical Sequencing StrategyMethods of physical mapping1. HybridizationHybridization – Computational Challenge2. DigestionWhole-Genome Shotgun SequencingWhole Genome Shotgun SequencingThe Overlap-Layout-Consensus approach1. Find Overlapping ReadsSlide 27Slide 281. Find Overlapping Reads (cont’d)Basic principle of assembly2. Merge Reads into Contigs (cont’d)Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 394. Derive Consensus SequenceMouse GenomeMouse AssemblySequencing in the (near) futureDNA Sequencingand AssemblyDNA sequencingHow we obtain the sequence of nucleotides of a species…ACGTGACTGAGGACCGTGCGACTGAGACTGACTGGGTCTAGCTAGACTACGTTTTATATATATATACGTCGTCGTACTGATGACTAGATTACAGACTGATTTAGATACCTGACTGATTTTAAAAAAATATT…Which representative of the species?Which human?Answer one:Answer two: it doesn’t matterPolymorphism rate: number of letter changes between two different members of a speciesHumans: ~1/1,000 – 1/10,000Other organisms have much higher polymorphism ratesDNA sequencing – vectors+=DNAShakeDNA fragmentsVectorCircular genome(bacterium, plasmid)Knownlocation(restrictionsite)Different types of vectorsVECTOR Size of insertPlasmid2,000-10,000Can control the sizeCosmid 40,000BAC (Bacterial Artificial Chromosome)70,000-300,000YAC (Yeast Artificial Chromosome)> 300,000Not used much recentlyDNA sequencing – gel electrophoresisStart at primer(restriction site)Grow DNA chainInclude dideoxynucleoside(modified a, c, g, t)Stops reaction at allpossible pointsSeparate products withlength, using gel electrophoresisElectrophoresis diagramsOutput of gel electrophoresis: a readA read: 500-700 nucleotidesA C G A A T C A G …. A16 18 21 23 25 15 28 30 32 21Quality scores: -10log10Prob(Error)Reads can be obtained from leftmost, rightmost ends of the insertDouble-barreled sequencing:Both leftmost & rightmost ends are sequencedMethod to sequence segments longer than 500cut many times at random (Shotgun)genomic segmentGet one or two reads from each segment~500 bp ~500 bpReconstructing the Sequence (Fragment Assembly)Cover region with ~7-fold redundancy (7X)Overlap reads and extend to reconstruct the original genomic regionreadsDefinition of CoverageLength of genomic segment: LNumber of reads: nLength of each read: lDefinition: Coverage C = nl/LHow much coverage is enough?(Lander-Waterman model):Assuming uniform distribution of reads, C=10 results in 1 gapped region /1,000,000 nucleotidesCChallenges with Fragment Assembly•Sequencing errors~1-2% of bases are wrong•Repeats•Computation: ~ O( N2 ) where N = # readsfalse overlap due to repeatRepeatsBacterial genomes: 5%Mammals: 50%Repeat types:Low-Complexity DNA (e.g. ATATATATACATA…)Microsatellite repeats: (a1…ak)N where k ~ 3-6(e.g. CAGCAGTAGCAGCACCAG)Common Repeat FamiliesSINE (Short Interspersed Nuclear Elements)(e.g. ALU: ~300-long, 106 copies)LINE (Long Interspersed Nuclear Elements)~500-5,000-long, 200,000 copiesMIRLTR/RetroviralOther-Genes that are duplicated & then diverge (paralogs)-Recent duplications, ~100,000-long, very similar copiesWhat can we do about repeats?Two main approaches:•Cluster the reads•Link the readsWhat can we do about repeats?Two main approaches:•Cluster the reads•Link the readsStrategies for sequencing a whole genome1. Hierarchical – Clone-by-clonei. Break genome into many long piecesii. Map each long piece onto the genomeiii. Sequence each piece with shotgunExample: Yeast, Worm, Human, Rat2. Online version of (1) – Walkingi. Break genome into many long piecesii. Start sequencing each piece with shotguniii. Construct map as you goExample: Rice genome3. Whole genome shotgunOne large shotgun pass on the whole genomeExample: Drosophila, Human (Celera), Neurospora, Mouse, Rat, FuguHierarchical SequencingHierarchical Sequencing Strategy1. Obtain a large collection of BAC clones2. Map them onto the genome (Physical Mapping)3. Select a minimum tiling path4. Sequence each clone in the path with shotgun5. Assemble6. Put everything togethera BAC clonemapgenomeMethods of physical mappingGoal: Make a map of the locations of each clone relative to one another Use the map to select a minimal set of clones to sequenceMethods:•Hybridization•Digestion1. HybridizationShort words, the p robes, attach to complementary words1. Construct many probes2. Treat each BAC with all probes3. Record which ones attach to it4. Same words attaching to BACS X, Y  overlapp1pnHybridization – Computational ChallengeMatrix:m probes  n clones(i, j): 1, if pi hybridizes to Cj0, otherwiseDefinition: Consecutive ones matrixA matrix 1s are consecutiveComputational problem:Reorder the probes so that matrix is in consecutive-ones formCan be solved in O(m3) time (m >> n)Unfortunately, data is not perfectp1 p2 …………………….pmC1 C2 ……………….Cn1 0 1…………………...01 1 0 …………………..00 0 1 …………………..1pi1pi2…………………….pimCj1Cj2 ……………….Cjn1 1 1 0 0 0……………..00 1 1 1 1 1……………..00 0 1 1 1 0……………..00 0 0 0 0 0………1 1 1 00 0 0 0 0 0………0 1 1 12. DigestionRestriction enzymes cut DNA where specific words appear1. Cut each clone separately with an enzyme2. Run fragments on a gel and measure length3. Clones Ca, Cb have fragments of length { li, lj, lk }  overlapDouble digestion:Cut with enzyme A, enzyme B, then enzymes A + BWhole-Genome Shotgun SequencingWhole Genome Shotgun Sequencingcut many times at randomgenomeforward-reverse linked readsplasmids (2 – 10 Kbp)cosmids (40 Kbp)known dist~500 bp~500 bpThe Overlap-Layout-Consensus approach1. Find overlapping reads4. Derive consensus sequence..ACGATTACAATAGGTT..2. Merge good pairs of reads into longer contigs3. Link contigs to form supercontigs+ many heuristics1. Find Overlapping Reads•Sort all k-mers in reads (k ~ 24)TAGATTACACAGATTACTAGATTACACAGATTAC|||||||||||||||||•Find pairs of reads sharing a k-mer•Extend to full alignment – throw away if not >95% similarT GATAGA| ||TACATAGT||1. Find


View Full Document

Stanford CS 374 - DNA Sequencing and Assembly

Documents in this Course
Probcons

Probcons

42 pages

ProtoMap

ProtoMap

19 pages

Lecture 3

Lecture 3

16 pages

Load more
Download DNA Sequencing and Assembly
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view DNA Sequencing and Assembly and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view DNA Sequencing and Assembly 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?