Unformatted text preview:

1Sequencing and MappingCAP 5937-01 BioinformaticsFall 2004Amar MukherjeeBiology Background The genome of an organism is the complete set of the DNA sequences that constitutes its total genetic information content in a cell. This information is wrapped up in a set of chromosomes in the cell.  The eukaryotes also have an additional set of extrachromosomal genes. These are located outside the nucleus of the cell within the energy producing organelles called mitochondria.2Biology Background For plants and algae, there are genes located in the chloroplasts. By the word genome, we usually mean the nuclear genome.  For prokaryotic cell, the genome is a circular DNA molecule.  For eukaryotes, like human, the genome consists of a set of linear DNA molecules contained in different chromosomes. Biology Background In most eukaryotes, there are two copies of each chromosome, and hence two copies of each gene. This is called the diploidcomplement.  The nucleus of a haploid contains only one copy of each chromosome, found only in reproductive cells. The number of chromosomes in a genome is characteristic of a given species.3ExampleOrganism Genome Size(kb) No. of Chromosomes Avg. no. of DNA/chromosome-----------------------------------------------------------------------------------------------------------------ProkaryotesE.Coli 4 000 1 4000EukaroytesYeast 20 000 16 1250Fruit Fly 165 000 4 41 250Human 3 200 000 23 130 000Mouse 3 454 200Maize 15000 000 10 1 500 000Salamander 90 000 000 12 7 500 000Puffer Fish 375 000Biology Background Obviously, genome size does not predict the complexity of the organism and also there is no direct correlation between the genome size and the number of chromosomes.  It is generally true that it takes more genes to make the species more complex but there are also other factors.  About 2-3% of the human nuclear genome actually takes part in the production of proteins.  Even if we ignore the introns, apparently 70 to 80% of the genome is unused! This paradox may be due to the existence of highly repetitive DNAs.4Sequencing In order to understand the structure and functions of the genome, we need to first extract the complete base-pair sequence in the chromosomes.  The goal of the Human Genome project was to obtain this complete DNA sequence information. The process of obtaining this information is called sequencing.  Current available biotechnology does not allow sequencing a DNA molecule having more than a few hundred bp (less than 1000 bp). Sequencing Before the genome project was started, biologists started sequencing thousands of mRNAscorresponding to coding genes.  The process involved first purifying mRNA, then obtaining complementary DNA (cDNA) by reverse transcriptase.  Sequencing the cDNA gives immediate information of the DNA of the original gene. However, the cDNAfragment containing a gene is considerably smaller than the genomic DNA. This difficulty has given rise to several challenging problems in computational biology.5Molecular Biology Laboratory Techniques: DNA Sequence DNA Sequencing : separating DNA segments according to size (Gel Electrophoresis) The DNA sequence can be read by a technique called gel electrophoresis which separates DNA molecules into groups depending on their lengths.  Gel electrophoresis has high resolution; even fragments which differ by a single nucleotide can be separated. The sample molecules are placed in a gel under the influence of an electric field. DNA Sequencing The DNA or RNA molecules (which are slightly negatively charged) can migrate towards the positive electric field.  The speed of migration is inversely proportional to the length of the molecule; longer molecules move slow, shorter move faster.  All molecules are initially placed at the top of the ‘well’ and after a few hours, the molecules move to different locations depending on its length.  If the molecules are labeled with radioactive isotopes, their positions can be photographed on a film.6DNA Sequencing DNA or a RNA molecule can be sequenced using these techniques as follows.  Given a DNA molecule, obtain all fragments that end in a single letter A.  Similarly, obtain all sequences ending in T, C andG. For example, if the sequence is GATTCGGATTTACT the fragments that end in T are GAT,GATT, GATTCGGAT, GATTCGGATT, GATTCGGATTT and the whole sequence GATTCGGATTTACT. These subsequences are formed by special enzymatic chemical reactions in presence of DNA polymerase and ddATP, ddTTP, ddCTPand ddTGP..  The replication of the DNA sequence is stopped at positions occupied by the four bases A,T,C and G by base analogs for each of the individual wells.  The sequences are also labeled with a primer at the beginning.7 In modern automated sequencing, the primer is replaced by a different fluorescent probes and the signals from the probes are detected by special detectors.  After a period of incubation, these sequences are then placed in four wells, the A-well, the T-well, the C-well and the G-well and subjected to electric field simultaneously.  We can conclude the precise sequence of the original fragment.  The figure below illustrates the principle. We assume here that the positive terminal is on the top and the shorter fragments leave their mark near the top.A T C GG -------A -------T -------T -------C -------G -------G -------A -------T -------T -------T -------A -------C -------T -------8 If you now read the horizontal bars from top to bottom corresponding to the wells, you will get the entire sequence GATTCGGATTTACT For further details, see http://web.utk.edu/~khughes/main.htm The gel electrophoresis technique was developed in 1970 by Maxam and Gilbert and Sanger. Since the method obtained the DNA fragments by chemical degradation of part of the sequence, it was not very reliable. A more efficient and reliable method is to use PCR which we describe


View Full Document

UCF CAP 5937 - Sequencing and Mapping

Documents in this Course
Load more
Download Sequencing and Mapping
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Sequencing and Mapping and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Sequencing and Mapping 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?