Lecture 22 DNA sequencing technologies computer technologies and a variety of other technologies have enabled the analysis of entire genome where we once were confined to studying one gene at a time Structural genomics Genetic maps other know genes o Organization and sequence of genetic information contained within a genome o Proved a rough approximation of the locations of genes relative to the locations of o Based on the recombination o If recombination frequency is 50 loci are located on different chromosomes or far apart on the same chromosome o 50 loci close together on same chromosome o linked genes rate of recombination is proportional to the physical distance between the loci o distances on maps are measured in percent recombination or map units previously genes could be detected only by observing their influence on a trait o genetic maps limited to single locus traits by evidence of recombination Limitations to genetic maps o Resolution o Only approximations of real physical distances along the chromosome They are based on rates of crossing over which vary from one part of Physical maps chromosome to another o Based on the direct analysis of DNA and they place genes in relation to distances measured in number of base pairs o Connects isolated pieces of genomic DNA that have been cloned o Higher resolution and more accurate than genetic maps o Restriction mapping determines the position of restriction sites on DNA DNA cut with restriction enzyme and fragments are separated by gel electrophoresis the number of restriction sites in the DNA and distances between them can be determined by the number and positions of bands on the gel Doesn t tell order or precise location of restriction sites To map a DNA is cut with one r e and another is cut with different r e third cut with both Separted by gel electrophoresis and their sizes are compared Overlap can be used to position restriction site on original DNA Sequencing entire genome o Size is the main issue o Usually can only measure small fragments 500 700 bp sequences at one time o Difficulty in putting the short sequenced pieces of DNA back together o Human Genome Project map entire human genome Map based strategy Map based sequencing o Short sequenced fragments are assembled into a whole genome sequence by first creating detailed genetic and physical maps of the genome to provide locations of genetic markers restriction sites other genes or known DNA sequences at regularly spaced intervals along each chromosome Later makers used to align short sequenced fragments into their correct order o After genetic and physical maps created chromosomes are separated by pulsed field gel electrophoresis Standard g e cannot separate pieces this large o Each chromosomes then cut up by partial digestion with r e Partial digestion r e only allowed to act for limited time so not all sites in every DNA molecule are cut Produced a set of large overlapping DNA fragments which are then cloned o Large insert clones put together in correct order on chromosome Method 1 relied on presence of high density map of genetic markers A set of two or more overlapping DNA fragments that form a contiguous stretch of DNA contig o Very difficult and slow process o Sequence fragments that allow you to infer the linear sequence of chromosome o Also called Clone by Clone Sequencing Whole genome shotgun sequencing o Small insert clones are prepared directly from genomic DNA and sequenced o Powerful computer programs then assemble the entire genome by examining overlap among the small insert clones o These clones can be placed into plasmids which are simple and easy to manipulate o Fragment genome into workable sizes clone fragments sequence all clones o Large amont of repeat DNA make assembly hard o Computer assembly is essential this method much faster Single Nucleotide Polymorphisms o Individual members of a species differ in a single base pair called this SNP o Arising through mutation SNP inherited as allelic variants but do not cause o Arose from a single mutation on particular chromosome and then spread phenotypic differences throughout population Differ at every 1000 nucleotides from each other o Each SNP is initially associated with other SNPs present on the particular chromosome on which mutation arose o Specific set of SNPs on chromosome or part of chromosome called haplotype SNPs within haplotype are physically linked and tend to be inherited together Haplotypes can arise through mutation or crossing over bring up particular set of SNPS in haplotype o Nonrandom association between genetic variants within haplotype is called linkage disequilibrium o Use As markers in linkage studies when close to disease causing locus usually inherited together can reveal presence of genes that affect the disease o Genome Wide Association Studies GWAS using SNPs to find genes Look at SNPs present look for correlations with disease or other phenotypes Close association might indicate the SNP is in or near a gene that plays a big role in determining that disease o Compare SNPs between two groups look for correlations control group vs unknown Copy number Variants CNVs o Differences among people in the number of copied of large DNA sequences o Deletions duplication o Most contain multiple genes and potentially affect the phenotype by altering gene dosage and changing position of sequences which may affect regulation of nearby genes Expressed Sequence Tag EST o If only protein encoding genes of interest mRNA examination instead of entire o RNA examined using ESTs markers accociated with DNA sequences that are o Isolate RNA from cell and using reverse transcription producing cDNA fragments DNA genomic sequence expressed as RNA that correspond to RNA Short stretched from ends of cDNA are sequences and called a tag which procides a maker that identifies the DNa fragment Annotation o After a gene has been identified it must be annotated linking its sequence information to other information about its function and expression the protein tha it encodes and information on similar genes in other species o Start by applying genetic spelling and grammer rules Like find known consensus equences such as TATA boes splice sites open reading frames o Functional annotation example use all mRNA to help identify expressed genes Indentification of all RNA molecules transcribed from genome transcriptome and all proteins encoded by the genome proteome o Comparative approach used genes from other
View Full Document