DOC PREVIEW
Stanford CS 262 - Lecture Notes

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Against a Whole-Genome ShotgunPhilip Green1Department of Molecular Biotechnology, University of Washington, Seattle, Washington 98195The human genome project is entering its decisivefinal phase, in which the genome sequence will bedetermined in large-scale efforts in multiple labora-tories worldwide. A number of sequencing groupsare in the process of scaling up their throughput;over the next few years they will need to attain acollective capacity approaching half a gigabase peryear to complete the 3-Gb genome sequence by thetarget date of 2005. At present, all contributinggroups are using a clone-by-clone approach, inwhich mapped bacterial clones (typically 40–400 kbin size) from known chromosomal locations are se-quenced to completion. Among other advantages,this permits a variety of alternative sequencingstrategies and methods to be explored indepen-dently without redundancy of effort. Although it isnot too late to consider implementing a differentapproach, any such approach must have as high aprobability of success as the current one and offersignificant advantages (such as decreased cost). I ar-gue here that the whole-genome shotgun proposedby Weber and Myers satisfies neither condition.Clone-by-Clone SequencingFor purposes of comparison it is helpful to first out-line a specific implementation of clone-by-clone se-quencing. Although by no means the only one pos-sible, this implementation is being used by severalof the larger groups and seems likely to be themethod of choice for the major part of the genome.One starts with a set of mapped sequence-taggedsites (STSs) (Olson et al. 1989) from a particularchromosomal region. These are screened against abacterial artificial chromosome (BAC) (or otherlarge bacterial clone) library (Kim et al. 1996) to ob-tain overlapping clusters of clones from that region.Since whole-genome mapping efforts are nearingthe target density of 1 STS per 100 kb [Hudson et al.1995; D.R. Cox and R.M. Myers et al. 1997, WorldWide Web (WWW) site for the Stanford Human Ge-nome Center, http://shgc.stanford.edu; E. Lander etal. 1997, WWW site for the Whitehead Institute/MIT Center for Genome Research, http://www-genome.wi.mit.edu], with several intensivelymapped chromosomes already exceeding it (Naga-raja et al. 1997, Bouffard et al. 1997), and BACs av-erage 130 kb or more in size in current libraries (Kimet al. 1996), this STS density should be adequate toobtain contiguous clone coverage of much of thegenome; most gaps that remain should be closableby developing new STSs directly from the sequenceadjacent to the gap and rescreening the library.Restriction digests are performed on the clonesobtained from the screens to determine their sizesand extent of overlap, and to eliminate anomalousclones, which generally have fingerprints inconsis-tent with other clones in the group. Selected clonesare then sequenced using a two-stage strategy, con-sisting of a shotgun phase in which a number ofreads are generated from random M13 or plasmidsubclones, followed by a directed, or ‘‘finishing’’phase. In the latter, the shotgun reads are assembledinto contigs, the assembly is inspected and testedfor correctness, additional data are collected to closegaps and resolve low-quality regions (e.g., compres-sions), and editing is performed to correct errors inassembly and to resolve discrepancies between readsand other data anomalies.The amount of finishing effort required de-pends in part on thedesired accuracy and complete-ness of the final sequence. In the case of the humangenome, the goal that has been agreed upon by theU.S. funding agencies and essentially all of the ma-jor sequencing groups is a complete and highly ac-curate sequence with less than one error per 10 kb.There are several reasons for this target: The genomesequence should serve as a reference against whichhuman variation can be cataloged, and conse-quently it should have an error rate substantiallylower than the estimated polymorphism rate of oneper kilobase; it should be accurate enough to permitgenes to be identified and distinguished from pseu-dogenes, so only a minority of genes should haveany errors in their coding regions (which average >1kb in length); and it should be accurate enough topermit any region of the genome to be reliably ob-tained by PCR (in particular, gaps should be small,infrequent and of known size). Current experience1E-MAIL [email protected]; FAX (206) 685-7344.PERSPECTIVE410 GENOME RESEARCH 7:410–417 ©1997 by Cold Spring Harbor Laboratory Press ISSN 1054-9803/97 $5.00indicates that this level of accuracy is attainablewithout unduly inflating the cost.Not surprisingly (in view of the profound im-pact cloning has had upon molecular biology), aclone-based approach has important strengths.Clones provide modularity, which is a crucial con-sideration when analyzing something as large andcomplex as the human genome. In particular, theymake it possible to target specific regions; to parti-tion the project among multiple investigators with-out forcing them to interact with each other; to iso-late problematic regions (e.g., repeats); and to adaptthe sequencing strategy as needed in regions withunusual features (e.g., GC-richness, highrepeatden-sity). Importantly, clone-by-clone sequencing forcesone to confront early on the issue of finishing andensures that feedback regarding data quality is ob-tained quickly.In addition, clones provide an important tech-nical resource for sequencing. They permit efficientresequencing and gap-filling at the finishing stage,and make it possible to test the correctness of theassembly by means of restriction digests. Finally, be-cause each clone represents a single haplotype,problems caused by the presence of polymorphismsare eliminated.Whole-Genome Shotgun SequencingWeber and Myers propose whole-genome shotgunsequencing of the human genome as an alternativeto clone-by-clone sequencing. Their approachwould consist of a single whole-genome library con-struction and characterization phase (for the entireproject), followed by a single shotgun phase, fol-lowed by a single finishing phase. In particular, fin-ishing issues would not be addressed until fairly latein the project.This is inherently a monolithic approach in-compatible with clone-by-clone sequencing, andconsequently it requires careful scrutiny. I will dis-cuss a number of objections to it, but the most se-rious one is that for a variety of reasons (detailedbelow) the finishing stage has a


View Full Document

Stanford CS 262 - Lecture Notes

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?