DOC PREVIEW
Stanford CS 262 - DNA Sequencing

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

DNA Sequencing 2/11/2008 1 The Task The task of DNA sequencing involves retrieving the sequence of nucleotide bases that make up DNA. In humans, this means reading the 3 billion or so bases that together serve as our genomic blueprint. Representative Sequence When we say that the human genome has been sequenced, whose DNA did we actually sequence? For the most part, it doesn’t matter because humans are as a rule 99.9% similar. The small differences that we have vary from person to person and when trying to infer the characteristics of a particular species or comparing multiple species, a specific individual genomic sequence is unnecessary. Even so, in 1999 Celera Genomics’ sequencing efforts utilized the DNA of its president and founder, the infamous Craig Venter. The variability within a species is determined by a genome’s polymorphism rate. This rate is defined as the number of letter changes between two different members of a species. For humans, this rate is approximately 1/1000. The sea squirt, Ciona, has a much higher polymorphism rate at about 5%. The importance of differences is dependent upon the task of research. In the area of pharmacogenomics, the goal is to custom tailor drugs to an individual’s genotype in an effort to optimize efficacy and minimize side-effects. In such a context, knowing what and where are the individual differences becomes significant. Human Population Migrations There are two main theories dealing with human population migration: 1) Out-of-Africa, 2) Multiregional Evolution. The latter has since become discredited. The former, dates the migration from Africa to be approximately 40,000 years ago. Examining maternally inherited mitochondrial DNA and building a phylogenetic tree for all mtDNA shows the descent of all humans from a single female (Eve) existing approximately 150,000 years ago. Similarly, building an evolutionary tree for Y-chromosomes reveals that all men descended from a single individual (Adam) approximately 70,000 years ago. Variability Model Assuming a scenario with no selection, a small fixed population size, and random mating:DNA Sequencing 2/11/2008 2 Where H is the heterozygosity, or the average number of differences between individuals; N is the population size; and μ is the mutation rate. For humans, taking H to be 1/1000 and μ to be 1 in a billion bases, the effective population size 40,000 years ago before leaving Africa and was on the order of 10,000 people. This is significantly less than then current worldwide population of approximately 6 billion. In general, the mutation rate across different organisms is very similar due to similar cellular machinery. High differences for organisms like Ciona are influenced most significantly by their population size. Validating the Out-of-Africa Theory When examining migration patterns, we expect to see a positive correlation between decreased variability and increased geographic distance (adjusted for large bodies of water) from origin. In other words, variability declines the farther the population diverges from the origin. Using various locations worldwide as starting points and plotting the correlations found on a map reveals that Africa consistently has the highest correlation values and South America the lowest. This validates the theory that humans migrated from Africa and settled last in South America. DNA Sequencing Methods To reiterate, the goal of sequencing is to find the sequence of A, T, C, G nucleotide bases that make up a genome. As of now, there is no machine that takes in a single length of DNA and outputs the complete sequence. The best modern machines rely on a process that randomly cuts up the genome, reads 500 to 1000 bases at a time, and stitches the reads together in assembly.DNA Sequencing 2/11/2008 3 Terminology An insert is a fragment that is incorporated in a circular genome at known restriction sites. Vectors are circular genome hosts for inserts. The idea is to hijack bacterial cellular machinery to amplify and duplicate relevant DNA fragments. Vectors range in size from small plasmids 2-10 kb to BAC, Bacterial Artificial Chromosome, a large type of insert-vector combination that is usually 100-200 kb in length. Sanger Sequencing Invented by Fred Sanger in 1975, the chain termination method has become the classic standard for DNA sequencing. The general procedure involves taking a piece of DNA and amplifying in a vector. Starting from the restriction site primers, grow the DNA chain using a mixture of nucleoside and dideoxynucleoside bases. The latter is added to the growing chains by chance and terminates the lengthening reaction resulting in sequences of random length. Given enough amplification, the randomness ensures sequences of all lengths up to a certain cutoff. The partial sequences are then put through gel electrophoresis that separates based on molecular motility dependent on molecular size or in this case, sequence length. A laser reads the functional group tags on the terminating dideoxynucleosides for the base calls. The raw data is rather noisy and requires post-processing to 1) filter, 2) smooth peaks, 3) correct for length compressions, and 4) to call bases (PHRED, PHil’s Read EDitor). PHRED outputs a sequence of base calls each given scores based on the confidence of the call. More specifically . The Reads are limited to a length of approximately 1000 bases due to the limited ability to separate by movement as the sequences get longer and also the lower probability of getting long sequences in the first place with random additions of the dideoxynucleosides. Other Alternative Methods Pyrosequencing uses beads with DNA attached as opposed to vectors for direct replication. Every time a base is incorporated, light is given off. The procedure cycles passes of A’s, T’s, G’s, and C’s where multiple additions of a certain base during each pass results in more light given off. This methods offers the advantages of being cheaper and faster than Sanger SequencingDNA Sequencing 2/11/2008 4 but the reads


View Full Document

Stanford CS 262 - DNA Sequencing

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download DNA Sequencing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view DNA Sequencing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view DNA Sequencing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?