LEHIGH CSE 397 - Computational Issues in Molecular Biology

Unformatted text preview:

CSE 397-497: Computational Issues in Molecular BiologyLopresti · Spring 2004 · Lecture 24- 1 -CSE 397-497:Computational Issues inMolecular BiologyLecture 24Spring 2004CSE 397-497: Computational Issues in Molecular BiologyLopresti · Spring 2004 · Lecture 24- 2 -Important points to remember•Final paper / project is due by 5:00 pm on Friday, April 30.• If you have questions about this, just ask.• If you still owe me a scribe report, get it to me ASAP.•Interested in giving an optional 5-minute presentation on your final project on last day of class? Let me know.• For those who are interested, workshop on graduate program in bio-engineering to be held on May 19 – details to follow.CSE 397-497: Computational Issues in Molecular BiologyLopresti · Spring 2004 · Lecture 24- 3 -Today we step back in time“The Invention of the Genetic Code,” Brian Hayes, American Scientist, vol. 86, no. 1, January-February 1998, pp. 8-14.To understand theories of the time, most of which sounded good but ultimately proved wrong, we must forget almost everything we know about molecular biology ...These early ideas had a strong computer science "flavor.”It's interesting to look back and see what (very smart) people were thinking in mid-1950's, just after double helix structure of DNA was unraveled but we still had no idea how it all worked.CSE 397-497: Computational Issues in Molecular BiologyLopresti · Spring 2004 · Lecture 24- 4 -Genetic Code timeline #11865 Gregor Mendel, working alone in Austrian monastery, discovers that some characteristics are inherited in ‘units’.1870 Friedrich Miescher isolates chemicals from cell nucleus, including ‘nucleic acids’. However, most people are more interested in proteins in nucleus.1879 Walter Flemming describes behavior of chromosomes during cell division, implicating these nuclear structures in inheritance.1900 Hugo DeVries and others rediscover Mendel’s work and establish first laws of inheritance.1909 Wilhelm Johannsen coins term ‘gene’.1911 Thomas Hunt Morgan is first to show that genes are arranged in linear fashion along chromosomes.http://www.wellcome.ac.uk/en/fourplus/DNA_timeline.htmlEarly work based on studying phenotypes. “Chromosome” is abstract concept – no one knows exactly what it is.CSE 397-497: Computational Issues in Molecular BiologyLopresti · Spring 2004 · Lecture 24- 5 -Genetic Code timeline #21928 Frederick Griffith uses chemical extract to convert harmless pneumonia bacteria into pathogenic forms, but nature of ‘inheritance factor’ is unknown.1929 Phoebus Levene discovers that a sugar, deoxyribose, is present in nucleic acids. Later identifies that DNA is made up of nucleotides, a chemical unit comprising a deoxyribose sugar, a phosphate group and one of four small organic molecules known as bases.1941 George Beadle & Edward Tatum show genes direct production of proteins.1943 William Astbury makes first X-ray diffraction images of DNA.1944 Building on Griffith’s work, Oswald Avery & colleagues show that DNA can ‘transform’ cells, cementing link between DNA and genes.1950 Edwin Chargaff discovers patterns in amounts of four bases in DNA: amounts of G and C, and of A and T, are always same.1951 Rosalind Franklin takes her first X-ray diffraction pictures.1953 James Watson & Francis Crick publish first paper proposing double helix structure for DNA.http://www.wellcome.ac.uk/en/fourplus/DNA_timeline.htmlCSE 397-497: Computational Issues in Molecular BiologyLopresti · Spring 2004 · Lecture 24- 6 -What was known in 1953?http://www.accessexcellence.org/AB/GG/nhgri_PDFs/dna.pdf•DNA composed of four nucleotides, A, C, G, T, forming double-stranded helix.•A binds with T, C binds with G, hence, strands are reverse complements.• DNA replicates itself during cell division (transcription).•Proteins composed of 20 amino acids.• Protein production controlled by genes.• DNA seems to be the genetic material.But what is the connection???... and ...CSE 397-497: Computational Issues in Molecular BiologyLopresti · Spring 2004 · Lecture 24- 7 -What wasn't known in 1953?✗No DNA sequences (had not been sequenced yet).✗Fragmentary information about protein sequences (insulin).✗Concept of RNA (including mRNA and tRNA).✗The Genetic Code – mapping from a four symbol alphabet to a 20 symbol alphabet – and how it is implemented.?? ACGT... ?? ?? Pro-Lys-... ??Black BoxCSE 397-497: Computational Issues in Molecular BiologyLopresti · Spring 2004 · Lecture 24- 8 -Back in 1953http://allserv.rug.ac.be/~avierstr/principles/centraldogma.htmlUnknown??KnownKnownCSE 397-497: Computational Issues in Molecular BiologyLopresti · Spring 2004 · Lecture 24- 9 -What does this imply?Pretend you're back in 1953 ...Just to make it interesting:• You don't know any sequence for a real DNA molecule. Sequences for a few proteins are just becoming available.• You know that DNA is a double helix made up of two strands, each over a four symbol alphabet.•Likewise, proteins are sequences over a 20 symbol alphabet.• You believe that DNA is the genetic material.•What's the connection between a DNA molecule and the proteins it is purported to produce?• Anything you propose will be an abstract theory awaiting later experimental validation. But that's okay ...CSE 397-497: Computational Issues in Molecular BiologyLopresti · Spring 2004 · Lecture 24- 10 -Some additional contextAt about the same time, information theory was coming into vogue. Claude Shannon joined Bell Labs in 1941 and soon started working on a fundamental approach for expressing information in a quantitative way. The goal was to make information a measurable quantity, like density or mass.http://www.nyu.edu/pages/linguistics/courses/v610003/shan.htmlSurely nature is just as efficient as anything we could invent?The repercussions were felt throughout science. Now we could talk, in a formal way, about coding theory, i.e., efficient schemes for storing and transmitting informationCSE 397-497: Computational Issues in Molecular BiologyLopresti · Spring 2004 · Lecture 24- 11 -Already obvious, even without support of experimental data:Some preliminaries (should look familiar)• DNA is sequence over a four symbol alphabet.• Protein is sequence over a 20 symbol alphabet.41 = 4 < 20 ... nope, not enough42 = 16 < 20 ... nope, not enough43 = 64  20 ... looks good!•


View Full Document

LEHIGH CSE 397 - Computational Issues in Molecular Biology

Download Computational Issues in Molecular Biology
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Computational Issues in Molecular Biology and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Computational Issues in Molecular Biology 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?