Stanford CS 262 - Protein Structure Prediction—Overview

Unformatted text preview:

Protein Structure Prediction Overview Structure Determines Function The Protein Folding Problem What determines structure Energy Kinematics How can we determine structure Experimental methods Computational predictions Primary Structure Sequence The primary structure of a protein is the amino acid sequence Secondary Structure loops helices and sheets are stabilized by hydrogen bonds between backbone oxygen and hydrogen atoms Second and a half ary Structure Motifs beta helix beta barrel beta trefoil Tertiary Structure Domains Mosaic Proteins Protein Folds Composed of other Quaternary Structure Multimeric Proteins or Functional Assemblies Multimeric Proteins Macromolecular Assemblies Ribosome Protein Synthesis Hemoglobin A tetramer Replisome DNA copying Protein Folding The amino acid sequence of a protein determines the 3D fold Anfinsen et al 1950s Some exceptions All proteins can be denatured Some proteins have multiple conformations Some proteins get folding help from chaperones The function of a protein is determined by its 3D fold Can we predict 3D fold of a protein given its amino acid sequence Quick Overview of Energy Bond Strength kcal mole H bonds 3 7 Ionic bonds 10 Hydrophobic interactions 1 2 Van der vaals interactions 1 Disulfide bridge 51 The Hydrophobic Effect Important for folding because every amino acid participates 2 25 Trp 0 26 Thr 1 80 Ile 0 13 His 1 79 Phe 0 00 Gly 1 70 Leu 0 04 Ser 1 54 Cys 0 22 Gln 1 23 Met 0 60 Asn 1 22 Val 0 64 Glu 0 96 Tyr 0 77 Asp 0 72 Pro 0 99 Lys 0 31 Ala 1 01 Arg Experimentally Determined Hydrophobicity Levels Fauchere and Pilska 1983 Eur J Med Chem 18 369 75 Protein Structure Determination Experimental X ray crystallography NMR spectrometry Computational Structure Prediction The Holy Grail Sequence implies structure therefore in principle we can predict the structure from the sequence alone Protein Structure Prediction ab initio Use just first principles energy geometry and kinematics Homology Find the best match to a database of sequences with known 3Dstructure Combinations Threading Meta servers and other methods Ab initio Prediction Sampling the global conformation space Lattice models Discrete state models Molecular Dynamics Picking native conformations with an energy function Solvation model how protein interacts with water Pair interactions between amino acids Predicting secondary structure Local homology Fragment libraries Ab initio Prediction ROSETTA 1 PSI BLAST homology search Discard sequences with 25 homology 2 PHD For each 3 long and each 9 long sequence fragment get 25 structure fragments that match well 3 Markov Chain Monte Carlo method Insert and remove iteratively one short structure fragment at a time Ab initio Prediction CASP results Only a few folds are found in nature The SCOP Database Structural Classification Of Proteins FAMILY proteins that are 30 similar or 15 similar and have similar known structure function SUPERFAMILY proteins whose families have some sequence and function structure similarity suggesting a common evolutionary origin COMMON FOLD superfamilies that have same secondary structures in same arrangement probably resulting by physics and chemistry CLASS alpha beta alpha beta alpha beta multidomain Status of Protein Databases PDB SCOP Structural Classification of Proteins Class EMBL Number of folds Number of superfamilies Number of families All alpha proteins 202 342 550 All beta proteins 141 280 529 Alpha and beta proteins a b 130 213 593 Alpha and beta proteins a b 260 386 650 Multi domain proteins 40 40 55 Membrane and cell surface proteins 42 82 91 Small proteins 72 104 162 Total 887 1447 2630 Evolution of Proteins Domains members in different families obey power law 429 families common in all 14 eukaryotes 80 of animal domains 90 of fungi plant domains 80 of proteins are multidomain in eukaryotes domains usually combine pairwise in same order Evolution of proteins happens mainly through duplication recombination and divergence Homology based Prediction Align query sequence with sequences of known structure usually 30 similar Superimpose the aligned sequence onto the structure template according to the computed sequence alignment Perform local refinement of the resulting structure The number of unique structural folds is small possibly a few thousand 90 of new structures submitted to PDB in the past three years have similar folds in PDB Homology based Prediction Raw model Loop modeling Side chain placement Refinement Examples of Fold Classes Homology based Prediction Threading MTYKLILN NGVDGEWTYTE Main difference between homology based prediction and threading Threading uses the structure to compute energy function during alignment Threading is in between homology based prediction and molecular modeling Threading Overview Build a structural template database Define a sequence structure energy function Apply a threading algorithm to query sequence Perform local refinement of secondary structure Report best resulting structural model Threading Template Database FSSP SCOP CATH Remove pairs of proteins with highly similar structures Efficiency Statistical skew in favor of large families Threading Energy Function MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE how preferable to put two particular residues nearby Ep alignment gap penalty Eg how well a residue fits a structural environment Es how often a residue mutates to the template residue Es compatibility with local secondary structure prediction Ess total energy wmEm wsEs wpEp wgEg wssEss Threading Formulation x u y z v Ci Cj z x y Contact graph captures amino acid interactions Cores represent important local structure units No gaps within each core v C1 C2 C3 C4 u a 0 t1a 1 t a 2 2 t a 3 3 t4 a 4 Threading Formulation How Hard is Threading At least as hard as MAX CUT MAX CUT Given graph G V E find a cut S T of V with maximum number of edges between S and T The Bad News APX complete even when each node has at most B edges where B 2 1 7 2 6 3 4 5 Reduction of MAX CUT to Threading 1 7 2 6 3 4 5 01 01 01 01 01 01 01 v1 v2 v3 v4 v5 v6 v7 Sequence consists of V 01 pairs V cores each core i has length 1 and corresponds to vi Let Ep 0 1 1 every edge labeled 0 1 or 1 0 gets a score of 1 Then size of cut threading score Integer Programming Formulation maximize Integer Program Linear Program z 6x 5y Linear function Subject to 3x y 11 x 2y 5 Linear contraints x y 0 x y integer Integral contraints nonlinear RAPTOR integer programming based threading perhaps the


View Full Document

Stanford CS 262 - Protein Structure Prediction—Overview

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Protein Structure Prediction—Overview
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Protein Structure Prediction—Overview and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Protein Structure Prediction—Overview 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?