Unformatted text preview:

A BioInformatics Survey . . . some taste of theory, and a few practicalitiesTo begin, some terminology —My definitions, lots of overlap —And one way to think about it —The exponential growth of molecular sequence databasesPowerPoint PresentationSlide 7Slide 8Slide 9A little history —Slide 11Slide 12Sequence database organization —Slide 14More format complications —Specialized ‘sequence’ -type databases —Slide 17And still other types of bioinfo’ databases —Enter pairwise alignment, similarity searching, significance, and homology.Slide 20Slide 21Slide 22Slide 23Noise due to random composition effects contributes to confusion. To ‘clean up’ the plot consider a filtered windowing approach. A dot is placed at the middle of a window if some ‘stringency’ is met within that defined window size. Then the window is shifted one position and the entire process is repeated (zero:one match score, window of size three and a stringency level of two out of three).Slide 25But —Slide 27An oversimplified path matrix example:Slide 29What about proteins — conservative replacements and similarity as opposed to identity. The nitrogenous bases are either the same or they’re not, but amino acids can be similar, genetically, evolutionarily, and structurally! The BLOSUM62 table (Henikoff and Henikoff, 1992).Slide 31So, first — significance: when is any alignment worth anything biologically?Slide 33Slide 34The Expectation Value!Rules of thumb for a protein search —On to the searches — How can you search the databases for similar sequences, if pairwise alignments take N2 time?! Significance and heuristics . . .Corn beef hash? Huh . . .OK. Heuristics . . . What’s that?Two predominant versions exist: BLAST and FastThe BLAST and Fast programs — some generalitiesThe algorithms, in brief —BLAST — the algorithm in more detail —The BLAST algorithm, continued —The Fast algorithm — in more detail —The Fast algorithm, continued —The Fast algorithm, still continued —What’s the deal with DNA versus protein for searches and alignment?On to multiple sequence alignment & analysis —Dynamic programming’s complexity increases exponentially with the number of sequences being compared:‘Global’ heuristic solutions of the N-dimensional matrix —Multiple Sequence Dynamic Programming —Web resources for pairwise, progressive multiple alignmentSo, what else is available?Reliability and the Comparative Approach —Structural & Functional correspondence in the Wisconsin Package’s SeqLab —As with pairwise methods, work with proteins! If at all possible —Beware of aligning apples and oranges [and grapefruit]!Mask out uncertain areas —Complications —Slide 61References —A BioInformatics SurveyA BioInformatics Survey . . . . . . some taste of theory, and some taste of theory, and a few practicalitiesa few practicalitiesSteve ThompsonSteve ThompsonFlorida State University School of Florida State University School of Computational Science (SCS)Computational Science (SCS)BCH 5405BCH 5405Molecular Biology & BiotechnologyMolecular Biology & BiotechnologyDr. Qing-Xiang (Amy) SangDr. Qing-Xiang (Amy) SangMon. & Wed., Mon. & Wed., March 24 & 26, 2008March 24 & 26, 2008To begin,To begin,some terminology —some terminology —What is bioinformatics, What is bioinformatics, genomics, proteomics, genomics, proteomics, sequence analysis, sequence analysis, computational molecular computational molecular biology . . . ?biology . . . ?My definitions, My definitions, lots of overlaplots of overlap — —BiocomputingBiocomputing and and computational biologycomputational biology are synonyms and are synonyms and describe the use of computers and computational techniques describe the use of computers and computational techniques to analyze any type of a biological system, from individual to analyze any type of a biological system, from individual molecules to organisms to overall ecology.molecules to organisms to overall ecology.BioinformaticsBioinformatics describes using computational techniques to describes using computational techniques to access, analyze, and interpret the biological information in any access, analyze, and interpret the biological information in any type of biological database.type of biological database.Sequence analysisSequence analysis is the study of molecular sequence data for is the study of molecular sequence data for the purpose of inferring the function, interactions, evolution, the purpose of inferring the function, interactions, evolution, and perhaps structure of biological molecules.and perhaps structure of biological molecules.GenomicsGenomics analyzes the context of genes or complete genomes analyzes the context of genes or complete genomes (the total DNA content of an organism) within the same and/or (the total DNA content of an organism) within the same and/or across different genomes.across different genomes.ProteomicsProteomics is the subdivision of genomics concerned with is the subdivision of genomics concerned with analyzing the complete protein complement, i.e. the proteome, analyzing the complete protein complement, i.e. the proteome, of organisms, both within and between different organisms.of organisms, both within and between different organisms.And one way to think about it —And one way to think about it —the Reverse Biochemistry Analogythe Reverse Biochemistry AnalogyBiochemists no longer have to begin a research Biochemists no longer have to begin a research project by isolating and purifying massive amounts project by isolating and purifying massive amounts of a protein from its native organism in order to of a protein from its native organism in order to characterize a particular gene product. Rather, characterize a particular gene product. Rather, now scientists can amplify a section of some now scientists can amplify a section of some genome based on its similarity to other genomes, genome based on its similarity to other genomes, sequence that piece of DNA and, sequence that piece of DNA and, using sequence using sequence analysis tools, infer all sorts of functional, analysis tools, infer all sorts of functional, evolutionary, and, perhaps, structural insight into evolutionary, and, perhaps, structural insight into that stretch of DNA!that stretch of DNA!The computer and molecular databases are a The computer and molecular databases are a necessary, integral part of this entire process.necessary,


View Full Document

FSU BCH 5405 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?