CMSC423: Bioinformatic Algorithms, Databases and ToolsWhat you missedProtein foldingProtein folding•Note: mis-folded proteins may cause disease (e.g. Creutzfeld-Jakob a.k.a. mad cow)• Drugs (e.g. antibiotics) often inhibit protein function – knowing structure can help design drugs•Folding@home – lend your computer’s unused cycles to help fold proteins (like SETI@home) (do you believe in evolution or aliens ?)Protein structure (primary structure = sequence)http://www.tulane.edu/~biochem/med/second.htmφψSecondary structure (motifs)helixsheetturnhttp://alpha2.bmc.uu.se/~kenth/bioinfo/structure/secondary/01.htmlTertiary structure (3D shape)http://www.umass.edu/microbio/rasmol/sayle1.htmFolded shape: lowest free energy•Energy components–electrostatic (~1/D2) (n2 terms)–van der Waals (n2 terms)–hydrogen bonding (n terms)–“bending” (n terms)–solvent (water/salt) (?? terms)–exclusion principle (no two atoms share same volume)• Energy minimzation–small perturbations & computation: hill climbing, simulated annealing, etc.• Molecular dynamicsHow do we know the truth?•X-ray crystallography–crystallize protein–shine X-rays–examine diffraction patterns• Nuclear Magnetic Resonance (NMR)–no crystallization necessary–magnetic field “vibrates” hydrogen atoms–Nobel prize: Kurt Wuethrichhttp://www.cryst.bbk.ac.uk/BBS/whatis/cryst_an.htmlhttp://www.cryst.bbk.ac.uk/PPS2/projects/schirra/html/2dnmr.htmSimpler problems•Secondary structure prediction• Side-chain conformation (assuming fixed backbone)• Protein docking (how do proteins interact)• Database searches (protein threading)• Simpler energy functions • Folding on a lattice (theoretical approximation)•Critical Assessment of Fully Automated Structure Prediction – competition on proteins with unpublished 3D structureProteomicsProteomics•Large-scale analysis of proteins–protein-protein interactions (e.g. yeast 2-hybrid)–2D gels (mass vs. isoelectric point)–Mass-spectrometry–Protein microarrays–etc.Proteomics•Why proteomics? Are DNA/RNA microarrays not sufficient?• RNA abundance is not necessarily related to protein abundance• Many proteins are modified post-translation–addition of additional molecules (phosphate, sugars, etc.)–creation of complexes (hemoglobin is actually 4 molecules)Mass spectrometry•Technique for measuring the mass-to-charge ratio of ions• Basic idea–shoot ions into a magnetic field–deflection depends on mass• Output of a mass-spectrometer–ions “sorted” by mass–for each mass bucket - number of ions with that specific massMass-spectrometryhttp://www.cem.msu.edu/~reusch/VirtualText/Spectrpy/MassSpec/masspec1.htmTandem Mass Spectrometry•First mass-spectrometer “focuses” on a specific protein• Second mass-spectrometer breaks the protein into smaller chunks• Problem: given the chunks, what was the original protein?Peptide sequencing•Peptide - a chunk of a protein, usually obtained by enzymatic cleavage of the protein (using trypsin)• Problem: Given an MS spectrum (weights of fragments), what was the sequence of the peptide? • Or: find the peptide (of mass m) that best matches the experimental dataBiological networksBiological networks•Genes/proteins do not exist in isolation• Interactions between genes or proteins can be represented as graphs• Examples:–metabolic pathways–regulatory networks–protein-protein interactions (e.g. yeast 2-hybrid)–genetic interactions (synthetic lethality)MetagenomicsWhy do we care?•Bacteria are everywhere in the environment• They are not all evil• Bacteria can be quite usefulBio-energyBio-remediationDrug developmentantibioticsanti-cancerHuman microbiome•Human = 1 order of magnitude more bacterial cells than human cells–critical to infant development (immune system, GI-tract)–provide essential nutrients (vitamin K, B12, essential amino-acids,....)–help digest complex molecules – starches, plant material–imbalances in normal bacterial populations correlate with disease(IBD, colon cancer, ...)Human microbiome projectnihroadmap.nih.gov/hmp/Some challenges on real data2508P1D132508P1D242508P3D132508P3D242510P1D132510P1D242510P2D132510P2D242504P1D242504P2D242503P2D242503P3D240500010000150002000025000ZZZ_UNKNOWNVerrucomicrobiaeSpirochaetesGammaproteobacteriaFusobacteriaFlavobacteriaErysipelotrichiEpsilonproteobacteriaDeltaproteobacteriaClostridiaBetaproteobacteriaBacteroidetesBacilliAlphaproteobacteriaActinobacteriaP1 P2 P3 P4Spatial genomicsVoxelation•Brown, V.M., et al., High-throughput imaging of brain gene expression. Genome Res, 2002. 12(2): p. 244-54.•http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11827944 •Brown, V.M., et al., Multiplex three-dimensional brain gene expression mapping in a mouse model of Parkinson's disease. Genome Res, 2002. 12(6): p. 868-84.•http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12045141 •Gene expression information in a spatial context•Combines microarray analysis with computer graphicsVanessa M. Brown et al. Genome Res. 2002; 12: 868-884Figure 2 Voxelation scheme• Mouse brain cut up into voxels• Run a separate microarray experiment on each voxelVanessa M. Brown et al. Genome Res. 2002; 12: 868-884Figure 4 Spatial gene expression patterns for the subset of correlated genesVanessa M. Brown et al. Genome Res. 2002; 12: 868-884Figure 7 SVD delineates anatomical regions of the brainVanessa M. Brown et al. Genome Res. 2002; 12: 868-884Figure 5 Putative regulatory elements shared between groups of correlated and anticorrelated
View Full Document