DOC PREVIEW
Stanford CS 374 - DNA Sequencing

This preview shows page 1-2-3-20-21-40-41-42 out of 42 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

DNA SequencingNext few topicsNew topic: DNA sequencingWhich representative of the species?Why humans are so similarMigration of human variationSlide 7Slide 8Slide 9DNA sequencing – vectorsDifferent types of vectorsDNA sequencing – gel electrophoresisElectrophoresis diagramsChallenging to read answerSlide 15Slide 16Reading an electropherogramOutput of PHRAP: a readMethod to sequence longer regionsSlide 20Definition of CoverageSlide 22RepeatsStrategies for whole-genome sequencingHierarchical SequencingHierarchical Sequencing StrategyMethods of physical mapping1. HybridizationHybridization – Computational ChallengeSlide 30Slide 312. DigestionOnline Clone-by-clone The Walking MethodThe Walking MethodSlide 35Advantages & Disadvantages of Hierarchical SequencingWalking off a Single SeedSlide 38Walking off several seeds in parallelSlide 40Whole-Genome Shotgun SequencingWhole Genome Shotgun SequencingDNA SequencingNext few topics•DNA SequencingSequencing strategies•Hierarchical•Online (Walking)•Whole Genome ShotgunSequencing Assembly•Gene RecognitionThe GENSCAN hidden Markov modelComparative Gene Recognition – Twinscan, SLAM•Large-scale and multiple sequence alignment•Microarrays, Regulation, and Motif-finding•Evolution and Phylogeny•RNA Structure and ModelingNew topic: DNA sequencingHow we obtain the sequence of nucleotides of a species…ACGTGACTGAGGACCGTGCGACTGAGACTGACTGGGTCTAGCTAGACTACGTTTTATATATATATACGTCGTCGTACTGATGACTAGATTACAGACTGATTTAGATACCTGACTGATTTTAAAAAAATATT…Which representative of the species?Which human?Answer one:Answer two: it doesn’t matterPolymorphism rate: number of letter changes between two different members of a speciesHumans: ~1/1,000Other organisms have much higher polymorphism ratesWhy humans are so similarA small population that interbred reduced the genetic variationOut of Africa ~ 100,000 years agoOut of AfricaMigration of human variationhttp://info.med.yale.edu/genetics/kkidd/point.htmlMigration of human variationhttp://info.med.yale.edu/genetics/kkidd/point.htmlMigration of human variationhttp://info.med.yale.edu/genetics/kkidd/point.htmlDNA SequencingGoal:Find the complete sequence of A, C, G, T’s in DNAChallenge:There is no machine that takes long DNA as an input, and gives the complete sequence as outputCan only sequence ~500 letters at a timeDNA sequencing – vectors+=DNAShakeDNA fragmentsVectorCircular genome(bacterium, plasmid)Knownlocation(restrictionsite)Different types of vectorsVECTOR Size of insertPlasmid2,000-10,000Can control the sizeCosmid 40,000BAC (Bacterial Artificial Chromosome)70,000-300,000YAC (Yeast Artificial Chromosome)> 300,000Not used much recentlyDNA sequencing – gel electrophoresis1. Start at primer (restriction site)2. Grow DNA chain3. Include dideoxynucleoside (modified a, c, g, t)4. Stops reaction at all possible points5. Separate products with length, using gel electrophoresisElectrophoresis diagramsChallenging to read answerChallenging to read answerChallenging to read answerReading an electropherogram1. Filtering2. Smoothening3. Correction for length compressions4. A method for calling the letters – PHRED PHRED – PHil’s Read EDitor (by Phil Green)Based on dynamic programmingSeveral better methods exist, but labs are reluctant to changeOutput of PHRAP: a readA read: 500-700 nucleotidesA C G A A T C A G …A16 18 21 23 25 15 28 30 32 …21Quality scores: -10log10Prob(Error)Reads can be obtained from leftmost, rightmost ends of the insertDouble-barreled sequencing:Both leftmost & rightmost ends are sequencedMethod to sequence longer regionscut many times at random (Shotgun)genomic segmentGet one or two reads from each segment~500 bp ~500 bpReconstructing the Sequence (Fragment Assembly)Cover region with ~7-fold redundancy (7X)Overlap reads and extend to reconstruct the original genomic regionreadsDefinition of CoverageLength of genomic segment: LNumber of reads: nLength of each read: lDefinition: Coverage C = n l / LHow much coverage is enough?Lander-Waterman model:Assuming uniform distribution of reads, C=10 results in 1 gapped region /1,000,000 nucleotidesCChallenges with Fragment Assembly•Sequencing errors~1-2% of bases are wrong•Repeats•Computation: ~ O( N2 ) where N = # readsfalse overlap due to repeatRepeatsBacterial genomes: 5%Mammals: 50%Repeat types:•Low-Complexity DNA (e.g. ATATATATACATA…)•Microsatellite repeats (a1…ak)N where k ~ 3-6(e.g. CAGCAGTAGCAGCACCAG)•Transposons SINE (Short Interspersed Nuclear Elements)e.g., ALU: ~300-long, 106 copiesLINE (Long Interspersed Nuclear Elements)~500-5,000-long, 200,000 copiesLTR retroposons (Long Terminal Repeats (~700 bp) at each end)cousins of HIV•Gene Families genes duplicate & then diverge (paralogs)•Recent duplications ~100,000-long, very similar copiesStrategies for whole-genome sequencing 1. Hierarchical – Clone-by-clone yeast, worm, humani. Break genome into many long fragmentsii. Map each long fragment onto the genomeiii. Sequence each fragment with shotgun2. Online version of (1) – Walking rice genomei. Break genome into many long fragmentsii. Start sequencing each fragment with shotguniii. Construct map as you go3. Whole Genome Shotgun fly, human, mouse, rat, fuguOne large shotgun pass on the whole genomeHierarchical SequencingHierarchical Sequencing Strategy1. Obtain a large collection of BAC clones2. Map them onto the genome (Physical Mapping)3. Select a minimum tiling path4. Sequence each clone in the path with shotgun5. Assemble6. Put everything togethera BAC clonemapgenomeMethods of physical mappingGoal: •Map the clones relative to one another •Use the map to select a minimal tiling set of clones to sequenceMethods:•Hybridization•Digestion1. HybridizationShort words, the probes, attach to complementary words1. Construct many probes p1, p2, …, pn2. Treat each clone Ci with all probes3. Record all attachments (Ci, pj)4. Same words attaching to clones X, Y  overlapp1pnHybridization – Computational ChallengeMatrix:m probes  n clones(i, j): 1, if pi hybridizes to Cj0, otherwiseDefinition: Consecutive ones matrix1s are consecutive in each row & colComputational problem:Reorder the probes so that matrix is in consecutive-ones formCan be solved in O(m3) time (m > n)p1 p2 …………………….pmC1 C2 ……………….Cn1 0 1…………………...01 1 0 …………………..00 0 1


View Full Document

Stanford CS 374 - DNA Sequencing

Documents in this Course
Probcons

Probcons

42 pages

ProtoMap

ProtoMap

19 pages

Lecture 3

Lecture 3

16 pages

Load more
Download DNA Sequencing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view DNA Sequencing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view DNA Sequencing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?