Structures from scratch Rhiju Das Departments of Biochemistry Physics BIOC 218 Feb 2010 PredicCng protein structure GTPDIIVNAQINS EDENVLDFIIEDEY YLKKRGVGAHIIK VASSPQLRLLYKN AYSTVSCGNYGVL CNLVQNGEYDLN AIMFNCAEIKLNK GQMLFQTKIWR This will happen to you a lot Proteins Proteins Proteins Two fundamental problems 1 PredicCng protein structure 2 PredicCng RNA structure GTPDIIVNAQINSEDENVLDF IIEDEYYLKKRGVGAHIIKVAS SPQLRLLYKNAYSTVSCGNYG VLCNLVQNGEYDLNAIMFNC AEIKLNKGQMLFQTKIWR ugcuccuaguacgag aggaccggagug Driving innova on in protein structure predic on CASP CASP1 1994 Critical Assessment of Structure Prediction Five blind predictions per target RMSD 16 0 From Neil Clarke CASP7 assessor s talk Driving innova on in protein structure predic on CASP CASP3 1998 Critical Assessment of Structure Prediction Five blind predictions per target DAVID BAKER colleagues RMSD 4 to 6 CASP6 2004 T0281 1 6 over 70 residues DAVID BAKER colleagues De novo Modeling with RoseTa Stage I Fragment Assembly De novo Modeling with RoseTa Stage II All atom refinement Ingredients of a high resolu on poten al 1 Van der waals packing 2 Hydrogen bonds D 3 ManifestaCons of water The cost of desolvaCon Polar atoms Non polar atoms The hydrophobic e ect 4 Torsional potenCal Ingredients of a high resolu on poten al 1 Van der waals packing 2 Hydrogen bonds D 3 ManifestaCons of water The cost of desolvaCon Polar atoms Non polar atoms The hydrophobic e ect Michael LeviT 1969 RoseTa in ac on A 1000 fold increase in computa onal power All atom energy Na ve CheY C RMSD to naCve structure All atom energy A 1000 fold increase in computa onal power Na ve CheY RoseTa home C RMSD to naCve structure Lowest energy RoseTa structure in CASP7 Number of top 3 votes RoseTa home CASP7 predictors Expected by chance From Neil Clarke CASP7 assessor s talk on free modeling Number of groups De novo successes all CASP7 target T0316 domain 3 Native Model 2 0 over 61 residues De novo successes all CASP7 target T0283 112 residues Native Model 1 4 over 90 residues De novo modeling connecCons to the real world The crystallographic phase problem Engineering new protein folds and new enzymes Non biological polymers beta proteins Hype Reality Reality Is protein folding solved NO Native Native Model Success in 1 3 of cases Conforma onal sampling s ll a huge issue Can you pick out the right one T304 CASP7 Can you pick out the right one Crystallographic model T304 CASP7 Best CASP model Can you pick out the right one A symptom of poor conformaConal sampling Crystallographic model T304 CASP7 Best CASP model Two fundamental problems 1 PredicCng protein structure GTPDIIVNAQINSEDENVLDF IIEDEYYLKKRGVGAHIIKVAS SPQLRLLYKNAYSTVSCGNYG VLCNLVQNGEYDLNAIMFNC AEIKLNKGQMLFQTKIWR 2 PredicCng RNA structure ugcuccuaguacgag aggaccggagug Proteins RNA How a physicist got into biochemistry 2000 A ourishing RNA world Engineered ribozymes and aptamers Riboswitches Conserved non coding RNA Breaker colleagues 2007 Conserved cloverleaf RNAs Human Accelerated Region 1 RNA Haussler et al 2006 The Das Lab Goal Nucleic Acid Structures You Can Trust ugcuccu aguacga gaggacc ggagug With de novo protein structure modeling as an inspira on how far can we get with computers Words and grammar for RNA GACACUAAGUUCGGCA UCAAUAUGGUGACCUC CCGGGAGCGGGGGACC ACCAGGUUGCCUAGAG GGGUGAACCGGCCCAG GUCGGAAACGGAGCAG GUCAAAACUCCCGUGC UGAUCAGUAGUGU Signal RecogniCon ParCcle RNA Oubridge et al 2002 Words and grammar for RNA Canonical double helices Non canonical regions Words and grammar for RNA De novo modeling Fragment Assembly of RNA FARNA De novo modeling Ingredients of a high resolu on poten al 1 Van der waals packing 2 Hydrogen bonds D 3 ManifestaCons of water The cost of desolvaCon Polar atoms Non polar atoms The hydrophobic e ect Michael LeviT Detailed molecular model of transfer RNA Nature 1969 Does it work Na ve state discrimina on The most conserved region of the signal recogniCon parCcle Low resolu on FARNA energy NaCve like conformaCons Non naCve decoys Na ve state discrimina on Low resolu on FARNA energy NaCve like conformaCons Non naCve decoys The most conserved region of the signal recogniCon parCcle High resolu on energy Na ve state discrimina on Low resolu on FARNA energy NaCve like conformaCons Non naCve decoys The most conserved region of the signal recogniCon parCcle High resolu on energy Can we decipher all the known words De novo modeling Na ve Model 1 4 rmsd 1 4 rmsd 1 7 rmsd In half the cases de novo modeling achieves 2 0 structures and selects them De novo modeling Na ve Model 1 4 rmsd 1 4 rmsd 1 7 rmsd In half the cases de novo modeling achieves 2 0 structures and selects them De novo modeling The biggest bogleneck conformaConal sampling De novo modeling The biggest bogleneck conformaConal sampling 1 0 rmsd We know the rules of the game but we have to play it be0er A universal obsession BeaCng the astronomical conformaConal sampling problem SoluCon 1 Data 3 SHAPE w adenine 83 72 Model NaCve 62 52 42 52 32 42 22 32 13 13 5 22 32 42 52 62 72 72 22 83 13 83 62 SoluCon 2 Humans FOLD IT ETERNA Baker lab With UW Comp Sci Adrien Treuille Seth Cooper Zoran Popovic David Salesin others With Adrien Treuille now at Carnegie Mellon and Jeehyung Lee SoluCon 3 Physics Computa onally expensive but gerng faster S ll no case of blind predic ons of structure SoluCon 4 Beger algorithms CA G C4 A G9 P P G3 C10 Step by step sampling CA G C4 A G9 P P G3 C10 Step by step sampling G C4 G9 P P G3 C10 Step by step sampling G5 P C4 G9 P P G3 C10 Step by step sampling G5 P C4 G9 P P G3 C10 Step by step sampling G5 P C4 G9 P P G4 C10 Step by step sampling G5 A8 P P C4 G9 P P G3 C10 Step by step sampling A7 P G5 A8 P P C4 G9 P P G3 C10 Step by step sampling C6 P P A7 P G5 A8 P P C4 G9 P P G3 C10 Step by step sampling 1ZIH NMR Lowest Energy Aha terms for base stacking RNA torsional poten al Had been dialed down to zero A legacy of fragment assembly Step by step sampling 1ZIH NMR Lowest Energy Wait there s s ll a cheat There are other pathways 2N total How to sample all paths Steal a trick from dynamic programming i e recursion Nucleic acid Sequence alignment ElectrophoreCc trace alignment 2 structure Ordering primers for PCR assembly for the least TTCTAATACGACTCACTATAGGCCAAAACAACGGAATTGCGGGAAAGGGGTCAACAGCCG 1 71 8 2 GCCCTTTCCCCAGTTGTCGGCAAGTCATGGTTCAGAGTCCCCTTTGAAACTCTACCG 58 1 GGGAAACTTTGAGATGGCCTTGCAAAGGGTATGGTAATAAGCTGACGGACATG 3 58 3 4 CATTATTCGACTGCCTGTACCAGGATTGGTGCGTCGGTT CAGCCAA CAGGATTCAGTTG 60 5
View Full Document
Unlocking...