Unformatted text preview:

Bioinformatics of Proteins Atomic Properties The Folding Problem Structure Alignments Structure Prediction Reza Jacob 4 June 2001 Biochemistry 118Q Proteins in Bioinformatics How do we represent structures for computation How do we compare structures in silico How do we classify structures hierarchically The Plan Apply constraints of chemistry Bond Lengths Bond Angles Dihedral Torsion Angles Place in Coordinate Frame Cartesian Internal Object Based Frames Compare Structures with i discrete components Root Mean Squared Deviation Basic Measurements Bond Lengths Bond Angles Dihedral Torsion Angles Bond Length Bond Length fixed given any scenario Depends on type of bond single double triple hybridization too Depends on which two atoms C H is 1 0 Angstroms C C is 1 5 Angstroms Bond Length is a function of Spatial Position of the two atoms Bond Length is Euclidean Distance For x1 y1 z1 and x2 y2 z2 d x1 x2 2 y1 y2 2 z1 z2 2 1 2 Some non covalent distances are also constant in a peptide s backbone Calpha Calpha distance for consecutive amino acids is constant too because of dihedral constraints Bond Angles Chemistry also fixes Bond Angles Depends on types of atoms hybridization states and number of lone electron pairs Range is 100 degrees to 180 degrees Bond Angles is a function of the spatial position of three atoms Dihedral Angles These vary Range from 0 to 360 in principle Common in proteins are Dihedral Angles are a function of the spatial position of four atoms in space Ramachandran Plot Steric constraints restrict possible set of dihedral angles Typical Secondary Structures have known Dihedral Angles Alpha Helix Phi 57 degrees psi 47 degrees Parallel Beta Strand Phi 119 degrees psi 113 degrees Antiparallel Beta Strand Phi 139 degrees psi 135 degrees Coordinate Frames Cartesian Frame has orthonormal x y z basis provides signed lengths for motion along each axis used in Protein DataBase But since bond lengths and angles are basically constant why not just specify dihedral angles Leads to internal coordinate frame Disadvantages of Internal Frame Basic computations like Euclidean distance are really difficult How about objects which aren t connected Makes algorithms more complex sometimes Object Based Coordinate Frame Certain part of proteins have less variability like an alpha helix backbone Treat helix backbone as rigid object Reduces number of parameters specified Comparing Structures Compare structures A B Need to know which atoms in A correspond to which in B Get this from BLAST Need to know position of all atoms Get this from PDB Comparing Structures How closely can two structures be superimposed Need an objective function to measure this If exactly the same measure 0 If divergent structures measure is large RMSD Algorithms Greedy search around center of mass for lowest RMSD Superimpose centers of mass Calculate RMSD Rotate slightly Re calculate RMSD and chose lowest Method based on translation and rotation matrices Algorithm based on eigenvectors Advantages of RMSD Nice behavior 0 when identical falls off continuously Easy to compute Units are natural Angstroms Commonly Used Similar structures show 1 3 Angstroms RMSD Disadvantages of RMSD All atoms are equally weighed Upper bound variable Significance cutoff increases as size increases Case Study Myoglobin Superfamily Eight structures involved Sperm whale myoglobin Sea hare myoglobin Plant leghemoglobin Sea lamprey hemoglobin Human alpha beta hemoglobin chains Chironomous hemoglobin Bloodworm hemoglobin Aligned by hand b c of low a a identity 115 common positions RMS for alpha carbons N N 1 2 pairwise RMSs computed N 8 Ranged from 1 22 to 3 16 Angstroms Average was 2 19 Angstroms Conclusions Compute bond length bond angles dihedral angles Work in different coordinate frames Use RMSD for structure comparison Graphical superimposition can elucidate structural similarities differences The Protein Folding Problem The Search Space Definitions of Energy Computing Free Energy The Energy Function MonteCarlo Methods Molecular Dynamics The Folding Problem How does the linear a a sequence fold to the 3 D shape off the ribosome And more broadly how do we get the 3 D structure given a linear a a sequence The Input Space Linear amino acid sequence Structure of each amino acid and peptide backbone Lists of atoms bond lengths bond angles Ramachandran constraints on dihedral angles The media Water and dissolved solutes salts The Output Space The 3 D coordinates of the protein in some frame Partial Answers 3 D structure of active site Location in linear sequence of secondary structure Prediction of class or family of the protein Why should we care Sequence Structure Function Structure very useful for Drug Design Hard to get structures experimentally X ray crystallography 80 1 2 A Nuclear Magnetic Resonance 20 1 3 A Cryo Electron Microscopy 1 7 10 A How hard is the problem Very Hard Huge search space For a 100 a a chain assume each a a can be in either alpha beta or coil state simplification 3100 5 1047 possible distinct folds At 1 fold every 0 10 ps it takes 1027 years Universe is 1010 years old Why is the problem hard How do we know when we have the correct fold Need to measure interactions between a a s water and other molecules You are folding proteins right now You do it in seconds Sampling the Output Space Secondary structure occurs regularly Can form locally independent of global structure Steric constraints eliminate some possibilities Maybe a nonrandom search Local structure can form and induce cascades Gibbs Free Energy G H T S Free Energy Enthalpic Energy Entropic Energy H benefits of interactions negative for folding T S costs of imposing order negative for folding Proteins fold because H T S Usually just by a narrow margin Entropy High entropy means disorder S k ln where arrangments If only 1 state is allowed 1 and S 0 Often hard to compute by statistical mechanics Turn to a more classical approach Energy Total Energy Potential Kinetic E U K Use Newtonian physical approximations Atoms and bonds as balls and springs Seek energy minima Writing an Energy Function Bond Lengths Bond Angles Dihedral Angles Ramachandran constraints Packing term nature abhors a vacuum Electrostatic interactions MonteCarlo Algorithm Choose a starting position P Evaluate the objective scoring function S Perturb the current position randomly or otherwise to P and compute S If S S let P P Else let P P with probability e S S Loop Relative Energies Hydrogen Bond


View Full Document

Stanford BIO 118 - Bioinformatics of Proteins

Documents in this Course
Surrogacy

Surrogacy

14 pages

Load more
Loading Unlocking...
Login

Join to view Bioinformatics of Proteins and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bioinformatics of Proteins and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?