MIT 6 047 - An introduction to the course, and an introduction to molecular biology

Unformatted text preview:

MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, EvolutionFall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.Lecture 1 Aims: An introduction to the course, and an introduction to molecular biology Administrative Details: -There were 4 handouts in class: a course information handout, the first problem set (due on Sept. 15 at 8 pm), the scribe policy, and a course survey. All of these are on the web page. -The first precept was on Friday 9/5 at 11: pm. The precept notes are posted on the web page. -There are 3 textbooks; they are on reserve at both MIT and at BU libraries. The course uses 3 books because no single book covers the all of the material in the class. No individual book covers both the algorithmic and the machine learning topics that will be addressed in the class. Also, because computational biology is a rapidly changing field, books quickly become outdated. The books are: (1) Biological sequence analysis (Durbin, Eddy, Krogh, and Mitchison). The caveat regarding this book: it is about 10 years old, which is a long time in computational biology. (2) An introduction to bioinformatics algorithms (Jones & Pevzner). As the title suggests, this book is a good resource for learning about bioinformatics algorithms. (3) Pattern classification (Duda, Hart, and Stork). A machine learning book. Why Computational Biology? There are a number of reasons that it is appropriate and useful to apply computational approaches to the study of biological data. -Many aspects of biology (such as sequence information) are fundamentally digital in nature. This means that they are well suited to computational modeling and analysis. -New technologies (such as sequencing, and high-throughput experimental techniques like microarray, yeast two-hybrid, and ChIP-chip assays) are creating enormous and increasing amounts of data that can be analyzed and processed using computational techniques -Running time & memory considerations are critical when dealing with huge datasets . An algorithm that works well on a small genome (for example, a bacteria) might be too time or space inefficient to be applied to 1000 mammalian genomes. Also, combinatorial questions dramatically increase algorithmic complexity. -Biological datasets can be noisy, and filtering signal from noise is a computational problem. -Machine learning approaches are useful to make inferences, classify biological features, & identify robust signals. -It is possible to use computational approaches to find correlations in an unbiased way, and to come up with conclusions that transform biological knowledge. This approach is called data-driven discovery. -Computational studies can suggest hypotheses, mechanisms, and theories to explain experimental observations. These hypotheses can then be tested experimentally. -Computational approaches can be used not only to analyze existing data but also to motivate data collection and suggest useful experiments. Also, computational filtering can narrow the experimental search space to allow more focused and efficient experimental designs. -Datasets can be combined using computational approaches, so that information collected across multiple experiments and using diverse experimental approaches can be brought to bear on questions of interest. -Effective visualizations of biological data can facilitate discovery. -Computational approaches can be used to simulate & model biological data.Finding Functional Elements: A Computational Biology Question We then discussed a specific question that computational biology can be used to address: how can one find functional elements in a genomic sequence? The slide that is filled with letters shows part of the sequence of the yeast genome. Given this sequence, we can ask: -What are the genes that encode proteins? How can we find features (genes, regulatory motifs, and other functional elements) in the genomic sequence? These questions could be addressed either experimentally or computationally. An experimental approach to the problem would be creating a knockout, and seeing if the fitness of the organism is affected. We could also address the question computationally by seeing whether the sequence is conserved across the genomes of multiple species. If the sequence is significantly conserved across evolutionary time, it’s likely to perform an important function. There are caveats to both of these approaches. Removing the element may not reveal its function—even if there is no apparent difference from the original, this could be simply because the right conditions have not been tested. Also, simply because an element is not conserved doesn’t mean it isn’t functional. (Also, note that “functional element” is an ambiguous term. Certainly, there are many types of functional elements in the genome that are not protein-encoding. Intriguingly, 90-95% of the human genome is transcribed (used as a template to make RNA). It isn’t known what the function of most of these transcribed regions are, or indeed if they are functional). Course Outline: The first half of the course will cover foundational material, the second half of the course will address open research questions. Each lecture will cover both biological problems and the computational techniques that can be used to study them. Some of the major topics that the course will address are: 1) Gene Finding 2) Sequence Alignment 3) Database Lookup: How can we effectively store and retrieve biological information? 4) Genome Assembly: How can we put together the small snippets of sequence produced by sequencing technologies into a complete genome. 5) Regulatory motif discovery: How can we find the sequence motifs that regulate gene expression? 6) Comparative genomics: How can we use the information contained in the similarities and differences between species’ genomes to learn about biological function? 7) Evolutionary Theory: How can we infer the relationships among species using the information contained in their genomes? 8) Gene expression analysis 9) Cluster Discovery: How can we find emergent features in the dataset? 10) Gibbs Sampling: How can we link clusters to the regulators responsible for the co-regulation? 11) Protein Network Analysis: How can we construct and analyze networks that represent the relationships among proteins? 12) Metabolic Modeling 13)


View Full Document

MIT 6 047 - An introduction to the course, and an introduction to molecular biology

Download An introduction to the course, and an introduction to molecular biology
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view An introduction to the course, and an introduction to molecular biology and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view An introduction to the course, and an introduction to molecular biology 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?