1Computational Molecular Biology SymposiumMarch 12th, 2003Carnegie Mellon UniversityOrganizer: Dannie DurandSponsored by the Department of Biological Sciences and the Howard Hughes Medical InstituteHuman Genetics and Genomics12:30 IntroductionDannie Durand, Carnegie Mellon12:45 Genome Assemblies and Interval Graphs Martin Farach-Colton, Rutgers2:15 Patterns of Human Genetic Diversity:Implications for Human Evolution and DiseaseSarah Tishkoff, Maryland3:15 Algorithms for extracting information from Human Genetic VariationRussell Schwartz , Carnegie Mellon4:30 Meet the speakers, 3301 NSHFrom Genes to OrganismHuman genome sequenceFebruary, 2001Human Genetics and Genomics•Acquisition•InterpretationHuman Genome– Cellular function– Tissue type differentiation–Development– Species specific traitsComplete set of genetic information in each cell“Human blueprint”Consensus sequence:Genetic material common to all humansComplete set of genetic information in each cell“Human blueprint”Consensus sequence:Genetic material common to all humansGenomic variation:Differences between individualsHuman Genome– Genetic basis for disease–Human history– Human evolution2Human genome sequenceFebruary, 2001Human Genetics and Genomics•Acquisition•Interpretation…aggaggcctcgcctctcccagcatgggctggggctcctgtcccccactgtgtgtgcctggggcctggccaggactcccagtga…chromosomeDNAcellDNA Sequencing: A whirlwind introductionDNADouble-stranded polymerFour letter alphabet: A, C, G, TBase pairing: A:T, G:COrientation…TACATAGGC…5’3’5’ 3’…ATGTATCCG…plasmidsbacterial chromosomeE. Coli GenomeHuman Genomenucleusmitochondrial genomeLisa Stubbs, Oak Ridge National LabOrganism: Mbases # of genes:E. Coli 4.6 ~4,000Baker’s yeast 12.1 ~6,000Fruit fly 180.0 ~13,000Worm 97.0 ~18,000Human 3200.0 ~30,000Human 17kb 37mitochondrionGenome Complexity3…aggaggcctcgcctctcccagcatgggctggggctcctgtcccccactgtgtatgcctggggcctggccaggactcccagtga…protein coding sequenceA gene is a location on a chromosome that encodes a proteinGenome Features• Genes…aggaggcctcgcctctcccagcatgggctggggctcctgtcccccactgtgtatgcctggggcctggccaggactcccagtga…promotorRegulatory regions: non-coding sequences that determine transcriptionGenome Features• Genes• Regulatory regions…aggcgagagagagagagagagcctggggcctggccaggctggggctcctgtccagagagagagaagtga…Genome Features• Genes• Regulatory regions• Repeated regions– Signature pattern– Length of pattern– Copy number– Distribution within the genome…aggcgagagagagagagagagcctggggcctggccaggctggggctcctgtccagagagagagaagtga…Genome Features• Genes• Regulatory regions• Repeated regions– Make up >50% of the human genome– Complicate sequence assembly– Vary from one individual to the nextUseful markers in studying diversityDNA Sequencing¾Tools for manipulating DNAEnzymes that cut, paste and modify• Methods for copying DNA fragments• Methods for determining the sequence of fragments• Assembling fragments into finished sequenceDNA Sequencing• Tools for manipulating DNAEnzymes that cut, paste and modify¾Methods for copying DNA fragments• Methods for determining the sequence of fragments• Assembling fragments into finished sequence4Isolate DNACloning – copying fragmentsfragmentationIsolate DNAfragmentationIsolate DNA+insert fragmentsplasmidAmplification Amplification Amplification5Cloning vectorsProperties– Size of insert–Host– Stability of insertExamples– Plasmids 5-10 Kb– Lambda Phage 20 Kb– BAC (Bacterial Artificial Chrom.) 100-200 Kb– YAC (Yeast Artificial Chrom.) 1000 KbFragment Sequencing• Tools for manipulating DNAEnzymes that cut, paste and modify• Methods for copying DNA fragments¾Methods for determining the sequence of fragments• Assembling fragments into finished sequenceFragment SequencingGiven a pool of a particular DNA fragment– Generate all prefixes (those enzymes again)– Sort them by size (gel electrophoresis)– Read base composition of fragmentPorousGEL+_aggctcatctcccaccagagg……..aaggctca……..aggcaggctc……..aggct……..A C G TPorousGEL+_agagg……..aaggctca……..aggcaggctc……..aggct……..A C G TImprovements in Sequencing Technology• Fluorescent bases6PorousGEL+_LASERDETECTORaggctcctctcccaccaagaggaggcaggctaggctc……..Improvements in Sequencing Technology• Fluorescence bases• Automation• Polymerase Chain Reaction (PCR)• Capillary-based sequencing machinesABI 3700 sequencerHistory of Sequencing1971 Nobel prize for restriction enzymes1973 First recombinant DNA1980 Nobel prize for DNA sequencing1988 Congress establishes Genbank1995 First genomic sequence1998 First multicellular organism2000 Fly genome2000 First plant genome2001 Human genome2003 Mouse genome22 million sequences28 billion base pairsDNA Sequencing• Tools for manipulating DNAEnzymes that cut, paste and modify• Methods for copying DNA fragments• Methods for determining the sequence of fragments¾Assembling fragments into finished sequenceSequence AssemblyLimits of gel electrophoresis: ~ 500bp in one “read”To sequence more than 500 bp:Sequence 500bp fragments separatelyCombine computationally using sequence comparison7Human genome sequenceFebruary, 2001Human Genetics and Genomics•Acquisition•InterpretationVariation within the Human GenomePolymorphism –– Occurrence of more than one type of genetic feature within a population.– A common variation in the sequence of DNA among individuals.Variation within Human PopulationsPolymorphisms: Alleles – variant types of the same geneSingle Nucleotide Polymorphisms (SNPs)HaplotypesTandem RepeatsIndelsExample: Blood GroupsVariation within an individual:Variation within a population:Blood type is polymorphic in the human population.allele – One of the variant forms of a gene at a particular locusBlood type: ABBlood groups: A,B,OExample: Blood GroupsVariation within an individual:Variation within a population:Blood type is polymorphic in the human population.Blood type: ABBlood groups: A,B,OVariation within Human PopulationsPolymorphisms: Alleles – variant types of the same geneSingle Nucleotide Polymorphisms (SNPs)HaplotypesTandem RepeatsIndels8Single Nucleotide PolymorphismsSNP:Variation at a single nucleotide position.Roughly one every 1,000 bases in the human
View Full Document