Unformatted text preview:

Sequencing and Mapping The genome of an organism is the complete set of the DNA sequences that constitutes its total genetic information content in a cell. This information is wrapped up in a set of chromosomes in the cell. The eukaryotes also have an additional set of extrachromosomal genes. These are located outside the nucleus of the cell within the energy producing organelles called mitochondria. For plants and algae, there are genes located in the chloroplasts. By the word genome, we usually mean the nuclear genome. For prokaryotic cell, the genome is a circular DNA molecule. For eukaryotes, like human, the genome consists of a set of linear DNA molecules contained in different chromosomes. In most eukaryotes, there are two copies of each chromosome, and hence two copies of each gene. This is called the diploid complement. The nucleus of a haploid contains only one copy of each chromosome, found only in reproductive cells. The number of chromosomes in a genome is characteristic of a given species. The following table gives examples. Organism Genome Size(kb) No. of Chromosomes Avg. no. of DNA/chromosome Prokaryotes E.Coli 4 000 1 4000 Eukaroytes Yeast 20 000 16 1250 Fruit Fly 165 000 4 41 250 Human 3 200 000 23 130 000 Mouse 3 454 200 Maize 15000 000 10 1 500 000 Salamander 90 000 000 12 7 500 000 Puffer Fish 375 000 Obviously, genome size does not predict the complexity of the organism and also there is no direct correlation between the genome size and the number of chromosomes. It is generally true that it takes more genes to make the species more complex but there are also other factors. About 2-3% of the human nuclear genome actually takes part in the production of proteins. Even if we ignore the introns, apparently 70 to 80% of the genome is unused. This paradox may be due to the existence of highly repetitive DNAs. In order to understand the structure and functions of the genome, we need to first extract the complete base-pair sequence in the chromosomes. The goal of the Human Genome project was to obtain this complete DNA sequence information. The process of obtaining this information is called sequencing. Current available biotechnology does not allow sequencing a DNA molecule having more than a few hundred bp (less than 1000 bp). Before the genome project was started, biologists started sequencing thousands of mRNAs corresponding to coding genes. The process iinvolved first purifying mRNA, then obtaining complementary DNA (cDNA) by reverse transcriptase. Sequencing thecDNA gives immediate information of the DNA of the original gene. However, the cDNA fragment containing a gene is considerably smaller than the genomic DNA. This difficulty has given rise to several challenging problems in computational biology. We will discuss these issues in this chapter. Before we do that we need to briefly review some of the molecular biology laboratory techniques that have been developed during the last few decades. DNA Sequencing : separating DNA segments according to size (Gel Electrophoresis) The DNA sequence can be read by a technique called gel electrophoresis which separates DNA molecules into groups depending on their lengths. Gel electrophoresis has high resolution; even fragments which differ by a single nucleotide can be separated. The sample molecules are placed in a gel under the influence of an electric field. The DNA or RNA molecules (which are slightly negatively charged) can migrate towards the positive electric field. The speed of migration is inversely proportional to the length of the molecule; longer molecules move slow, shorter move faster. All molecules are initially placed at the top of the ‘well’ and after a few hours, the molecules move to different locations depending on its length. If the molecules are labeled with radioactive isotopes, their positions can be photographed on a film. DNA or a RNA molecule can be sequenced using these techniques as follows. Given a DNA molecule, obtain all fragments that end in a single letter A. Similarly, obtain all sequences ending in T, C and G. For example, if the sequence is GATTCGGATTTACT the fragments that end in T are GATGATT, GATTCGGAT, GATTCGGATT, GATTCGGATTT and the whole sequence GATTCGGATTTACT. These subsequences are formed by special enzymatic chemical reactions in presence of DNA polymerase and ddATP, ddTTP, ddCTP and ddTGP which are used as ingredients to start copying the DNA sequence. The replication of the DNA sequence is stopped at positions occupied by the four bases A,T,C and G by base analogs for each of the individual wells. The sequences are also labeled with a primer at the beginning. In modern automated sequencing, the primer is replaced by a different fluorescent probes and the signals from the probes are detected by special detectors. After a period of incubation, these sequences are then placed in four wells, the A-well, the T-well, the C-well and the G-well and subjected to electric field simultaneously. We can conclude the precise sequence of the original fragment. The figure below illustrates the principle. We assume here that the positive terminal is on the top and the shorter fragments leave their mark near the top. A T C G G ------- A ------- T ------- T ------- C -------G ------- G ------- A ------- T


View Full Document

UCF CAP 5937 - Mapping and Sequencing

Documents in this Course
Load more
Download Mapping and Sequencing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Mapping and Sequencing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Mapping and Sequencing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?