DOC PREVIEW
MIT 7 03 - Analysis of Gene Sequences

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 11Analysis of Gene SequencesAnatomy of a bacterial gene: Sequence Element Function Promoter Transcription terminator Shine-Dalgarno sequence Coding Sequence Translation Stop Example: A gene coding sequence that is 1,200 nucleotide base pairs in length (including the ATG but not including the stop codon) will specify the sequence of a protein 1200/3 = 400 amino acids long. Since the average molecular weight of an amino acid is 110 da, this gene encodes a protein of about 44 kd — the size of an average protein.To target RNA polymerase to DNA and to start transcription of a mRNA copy of the gene sequence.To instruct RNA polymerase to stop transcription.S-D sequence in mRNA will load ribosomes to begin translation. Translation almost always begins at an AUG codon in the mRNA (an ATG in the DNA becomes an AUG in the mRNA copy). Synthesis of the protein thus begins with a methionine. Once translation starts, the coding sequence is translated by the ribosome along with tRNAs which read three bases at a time in linear sequence. Amino acids will be incorporated into the growing polypeptide chain according to the genetic code. When one of the three stop codons [UAG (amber), UAA (ochre), or UGA is encountered during translation, the polypeptide will be released from the ribosome. PromoterCoding Sequence (no stop codons)TranscriptionStartTranslation Start (AUG)TranscriptionTerminatorTranslation Stop(UAG, UAA, or UGA)S-D SequencemRNA:The Genetic CodeClassically, genes are identified by their function. That is the existence of the gene isrecognized because of mutations in the gene that give an observable phenotypic change.Historically, many genes have been discovered because of their effects on phenotype.Now, in the era of genomic sequencing, many genes of no known function can be detectedby looking for patterns in DNA sequences. The simplest method which works forbacterial and phage genes (but not for most eukaryotic genes as we will see later) is tolook for stretches of sequence that lack stop codons. These are known as “open readingframes” or ORFORFORFORFORFs. This works because a random sequence should contain an average ofone stop codon in every 20 codons. Thus, the probability of a random occurrence of evena short open reading frame of say 100 codons without a stop codon is very small (61/64)100 = 8.2 x 10–3Identifying genes in DNA sequences from higher organisms is usally more difficult than inbacteria. This is because in humans, for example, gene coding sequences are separatedby long sequences that do not code for proteins. Moreover, genes of higher eukaryotesare interrupted by intronsintronsintronsintronsintrons, which are sequences that are spliced out of the RNA beforetranslation. The presence of introns breaks up the open reading frames into shortsegments making them much harder to distinguish from non-coding sequences. The mapsbelow show 50 kbp segments of DNA from yeast, Drosophila, and humans. The dark greyboxes represent coding sequences and the light grey boxes represent introns. The boxesabove the line are transcribed to the right ant the boxes below are transcribed to theleft. Names have been assigned to each of the identified genes. Although the yeastgenes are much like those of bacteria (few introns and packed closely together), theDrosophila and human genes are spread apart and interrupted by many introns. Sophisti-cated computer algorithms were used to identify these dispersed gene sequences. Drosophila melanogasterCG16987CG2964CG15400CG3131CG3123syt050Saccharomyces cerevisiaeRGD2YFL046WFET5YFL040WTUB2RP041YFL034WHAC1YFL030WSTE2SEC53YFL042CYFL044CACT1 MOB2YPT1 RPL22BRIM15CAF16GYP8CAK1BST1EPL1050LOC139168HDAC6PCSK1NHumanGATA1050To see how gene sequences are actually obtained, we will first need to consider somefundamentals of the chemical structure of DNA. Each strand of DNA is directional. Thedifferent ends are usually called the 5’ and 3’ ends; referring to different positions onthe ribose sugar ring where the linking phosphate residues attach.In a double stranded DNA molecule the two strands run anti-parallel to one another andthe general structure can be diagramed like this:• Note about representation of DNA sequences.1) Single strands are always represented in direction of synthesis – 5’ to 3’2) For double stranded DNA, usually one strand is represented in the 5’ to 3’ direction.For a gene, the strand represented would correspond to the sequence of the mRNA.DNA polymersaes are the key players in the methods that we will be considering. Thegeneral reaction carried out by DNA polymerase is to synthesize a copy of a DNAtemplate starting with the chemical precursors (nucleotides) dATP, dGTP, dCTP, anddTTP (dNTPs). All DNA polymerases have two fundamental properties in common.(1) New DNA is synthesized only by elongation of an existing strand at its 3’ end.(2) Synthesis requires nucleotide precursors, a free 3’ OH end, and a template strand.A general substrate for DNA polymerase looks like this:Note that the template strand can be as short as 1 base or as long as several thousandbases.After addition of DNA polymerase and nucleotide precursors this product will be readilysynthesized:5’3’3’5’5’3’3’5’5’3’3’5’DNA SequencingDNA SequencingDNA SequencingDNA SequencingDNA SequencingConsider a segment of DNA that is about 1000 base pairs long that we wish to sequence.(1) The two DNA strands are separated. Heating to 100˚C to melt the base pairinghydrogen bonds that hold the strands together does this.(2) A short oligonucleotide (ca. 18 bases) designed to be complimentary to the end of oneof the strands is allowed to anneal to the single stranded DNA. The resulting DNAhybrid looks much like the general polymerase substrate shown previously.(3) DNA polymerase is added along with the four nucleotide precursors (dATP, dGTP,dCTP, and dTTP). The mixture is then divided into four separate reactions and to eachreaction a small quantity different dideoxy nucleotide precursor is added. Dideoxynucleotide precursors are abbreviated ddATP, ddGTP, ddCTP, and ddTTP.(4) The polymerase reactions are allowed to proceed and, using one of a variety ofmethods, radiolabel is incorporated into the newly synthesized DNA.(5) After the DNA polymerase reactions are complete, the samples are melted and run ona gel system that allows DNA strands of different lengths to be


View Full Document

MIT 7 03 - Analysis of Gene Sequences

Documents in this Course
Exams

Exams

22 pages

Exams

Exams

64 pages

Exam 1

Exam 1

66 pages

Exam I

Exam I

93 pages

Exam Two

Exam Two

12 pages

Exams

Exams

27 pages

Exam 1

Exam 1

41 pages

Load more
Download Analysis of Gene Sequences
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Analysis of Gene Sequences and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Analysis of Gene Sequences 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?