DOC PREVIEW
MIT 6 872 - Genomic Medicine-Basic Molecular Biology

This preview shows page 1-2-23-24 out of 24 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Genomic Medicine:Basic Molecular BiologyChildren’s Hospital Informatics Programwww.chip.orgChildren’s Hospital • BostonHarvard Medical SchoolMassachusetts Institute of TechnologyAtul Butte, [email protected] Biology• Organisms need to produce proteins for a variety of functions over a lifetime– Enzymes to catalyze reactions– Structural support– Hormone to signal other parts of the organism• Problem one: how to encode the instructions for making a specific protein• Step one: nucleotidesBasic Biology• Naturally form double helixes• Redundant information in each strand• Complementary nucleotides form base pairs• Base pairs are put together in chains (strands)5’3’3’5’Chromosomes• We do not know exactly how strands of DNA wind up to make a chromosome• Each chromosome has a single double-strand of DNA• 22 human chromosomes are paired• In human females, there are two X chromosomes• In males, one X and one YWhat does a gene look like?• Each gene encodes instructions to make a single protein• DNA before a gene is called upstream, and can contain regulatory elements• Introns may be within the code for the protein• There is a code for the start and end of the protein coding portion• Theoretically, the biological system can determine promoter regions and intron-exon boundaries using the sequence syntax aloneArea between genes• The human genome contains 3 billion base pairs (3000 Mb) but only 35 thousand genes• The coding region is 90 Mb (only 3% of the genome)• Over 50% of the genomeis repeated sequences– Long interspersed nuclear elements– Short interspersed nuclear elements– Long terminal repeats– Microsatellites• Many repeated sequences are different between individualsGenome size• We’re the smartest, so we must have the largest genome, right?• Not quite• Our genome contains 3000 Mb (~750 megabytes)• E. coli has 4 Mb• Yeast has 12 Mb• Pea has 4800 Mb• Maize has 5000 Mb• Wheat has 17000 MbGenomes of other organisms• Plasmodium falciparum chromosome 2Gardner M, et al. Science; 282: 1126 (1998).mRNA is made from DNA • Genes encode instructions to make proteins• The design of a protein needs to be duplicable• mRNA is transcribed from DNA within the nucleus• mRNA moves to the cytoplasm, where the protein is formedProteinDigitizing amino acid codes • Proteins are made of 20 (21) amino acids• Yet each position can only be one of 4 nucleotides• Nature evolved into using 3 nucleotides to encode a single amino acid• A chain of amino acids is made from mRNAGenetic CodeNature; 409: 860 (2001).Molecular BiologyNucleotidesDouble helixChromosomeGene/DNAGenomeAre inAre inHoldsHeld intRNARibosome mRNASignal SequenceJoined byOperates onPrefixed byAmino AcidProteinAre inCentral DogmaNucleotidesDouble helixChromosomeGene/DNAGenomeAre inAre inHoldsHeld intRNARibosome mRNASignal SequenceJoined byOperates onPrefixed byAmino AcidProteinAre inProtein targeting• The first few amino acids may serve as a signal peptide• Works in conjunction with other cellular machinery to direct protein to the right placeTranscriptional Regulation• Amount of protein is roughly governed by RNA level• Transcription into RNA can be activated or repressed by transcription factorsWhat starts the process?• Transcriptional programs can start from– Hormone action on receptors– Shock or stress to the cell– New source of, or lack of nutrients– Internal derangement of cell or genome– Many, many other internal and external stimuliTemporal Programs• Segmentation versus Homeosis: same two houses at different timesScott M. Cell; 100: 27 (2000).mRNA • mRNA can be transcribed at up to several hundred nucleotides per minute• Some eukaryotic genes can take many hours to transcribe– Dystrophin takes 20 hours to transcribe• Most mRNA ends with poly-A, so it is easy to pick out• Can look for the presence of specific mRNA using the complementary sequencePeriodic Table for Biology• Knowing all the genes is the equivalent of knowing the periodic table of the elements• Instead of a table, our periodic table may read like a treeMore Information• Department of Energy Primer on Molecular Genetics http://www.ornl.gov/hgmis/publicat/primer/primer.pdf• T. A. Brown, Genomes, John Wiley and Sons, 1999.Gene Measurement TechniquesDNA• Sequencing• PolymorphismsRNA• Serial analysis of gene expression• DNA Microarrays• WafersProtein•2D-PAGE• Mass spectrometry• Protein arraysSequencing Reactions• Sanger Reactions• Four color fluorescence-base sequence detection• Laser detector• Automated processJaklevic JM, et al. Annu Rev Biomed Eng 1:649 (1999).Sanger Chain TerminationSterky, F. & Lundeberg, J. Sequence analysis of genes and genomes. J Biotechnol 76, 1-31 (2000).Sanger MethodSequencing Reactions• PHRED: base-quality score for each base, based on probability of erroneous call• PHRED quality score of X means error probability of 10-x/10• PHRED score of 30 means 99.9% accuracy for base callBuetow KH, et al. Nature Genetics 21:323 (1999).Sequencing Reactions• PHRAP: assembles sequence data using base-quality scores into sequence contigs• Assembly-quality scores• Most of the genome was sequenced over 12 months• Highest throughput center at Whitehead: 100,000 sequencing reactions per 12 hours• Robots pick 100,000 colonies, sequence 60 million nucleotides per dayAssembly• Contamination from non-human sequences removed• Clones overlaid on physical map• High-quality semiautomatic sequencing from both ends of very large numbers of numbers of human genome fragments• Overlaps take memory: Drosophila 600 GB RAM• Human 10 4-processor 4 GB and 16-processor 64 GB, 10K CPU hrsGenome Browsers• Genome browsers: University of California at Santa Cruz and EnsEMBL• Overlap sequence, cytogenetic, SNP, genetic maps• Overlap annotations, disease genesSingle Nucleotide Polymorphisms• Three step approach• First, find the genes you are interested in• Second, catalog all the polymorphisms in a gene (by sequencing)• Third, measure those polymorphisms in a larger populationClinical use of SNPs• New publication with association of SNP with disease is almost a daily occurrenceGao, X. et al. Effect of a single amino acid change in MHC class I molecules on the rate of progression to AIDS. N Engl J Med 344, 1668-75 (2001).SNPs and pharmacogenomics• Genes will help us determine


View Full Document

MIT 6 872 - Genomic Medicine-Basic Molecular Biology

Download Genomic Medicine-Basic Molecular Biology
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Genomic Medicine-Basic Molecular Biology and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Genomic Medicine-Basic Molecular Biology 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?