New version page

Stanford BIO 118 - Bioinformatics of Proteins

Documents in this Course
Surrogacy

Surrogacy

14 pages

Load more
Upgrade to remove ads
Upgrade to remove ads
Unformatted text preview:

Bioinformatics of Proteins•Atomic Properties•The Folding Problem•Structure Alignments•Structure PredictionReza Jacob4 June 2001Biochemistry 118QProteins in Bioinformatics• How do we represent structures forcomputation?• How do we compare structures in silico?• How do we classify structureshierarchically?The Plan• Apply constraints of chemistry– Bond Lengths, Bond Angles, Dihedral (Torsion)Angles• Place in Coordinate Frame– Cartesian, Internal, & Object Based Frames• Compare Structures with i discrete components– Root Mean Squared DeviationBasic Measurements• Bond Lengths• Bond Angles• Dihedral (Torsion) AnglesBond Length• Bond Length fixed, given any scenario• Depends on type of bond: single, double,triple, hybridization too• Depends on which two atoms• C-H is 1.0 Angstroms, C-C is 1.5 Angstroms• Bond Length is a function of Spatial Positionof the two atomsBond Length is Euclidean DistanceFor (x1,y1,z1) and (x2,y2,z2),d={(x1-x2)2+(y1-y2)2+(z1-z2)2}1/2• Some non-covalent distances are alsoconstant in a peptide’s backbone• Calpha-Calpha distance for consecutive aminoacids is constant too because of dihedralconstraintsBond Angles• Chemistry also fixes Bond Angles• Depends on types of atoms, hybridizationstates, and number of lone electron pairs• Range is 100 degrees to 180 degrees• Bond Angles is a function of the spatialposition of three atomsDihedral Angles• These vary• Range from 0 to 360 in principle• Common in proteins are φ, ψ, ω, & χ• Dihedral Angles are a function of the spatialposition of four atoms in spaceRamachandran PlotStericconstraintsrestrictpossibleset ofdihedralanglesTypical Secondary Structureshave known Dihedral Angles• Alpha Helix– Phi=-57 degrees, psi=-47 degrees• Parallel Beta Strand– Phi=-119 degrees, psi=113 degrees• Antiparallel Beta Strand– Phi=-139 degrees, psi=135 degreesCoordinate Frames• Cartesian Frame has orthonormal (x,y,z)basis & provides signed lengths for motionalong each axis (used in Protein DataBase)• But since bond lengths and angles arebasically constant, why not just specifydihedral angles?• Leads to internal coordinate frameDisadvantages of Internal Frame?• Basic computations (like Euclidean distance)are really difficult• How about objects which aren’t connected?• Makes algorithms more complex sometimesObject-Based Coordinate Frame• Certain part of proteins have less variability,like an alpha helix backbone• Treat helix backbone as rigid object• Reduces number of parameters specifiedComparing Structures• Compare structures A & B• Need to know which atoms in A correspondto which in B– Get this from BLAST• Need to know position of all atoms– Get this from PDBComparing Structures• How closely can two structures besuperimposed?• Need an objective function to measure this• If exactly the same, measure = 0• If divergent structures, measure is largeRMSD Algorithms• Greedy search around center of mass for lowestRMSD– Superimpose centers of mass– Calculate RMSD– Rotate slightly– Re-calculate RMSD, and chose lowest• *Method based on translation and rotation matrices*– Algorithm based on eigenvectorsAdvantages of RMSD• Nice behavior– 0 when identical, falls off continuously• Easy to compute• Units are natural (Angstroms)• Commonly Used• Similar structures show 1-3 Angstroms RMSDDisadvantages of RMSD• All atoms are equally weighed• Upper bound variable• Significance cutoff increases as size increasesCase Study: Myoglobin Superfamily• Eight structures involved:• Sperm whale myoglobin• Sea hare myoglobin• Plant leghemoglobin• Sea lamprey hemoglobin• Human alpha & beta hemoglobin chains• Chironomous hemoglobin• Bloodworm hemoglobin• Aligned by hand b/c of low a.a. identity• 115 common positionsRMS for alpha carbons• N(N-1)/2 pairwise RMSs computed (N=8)• Ranged from 1.22 to 3.16 Angstroms• Average was 2.19 AngstromsConclusions• Compute bond length, bond angles, dihedralangles• Work in different coordinate frames• Use RMSD for structure comparison• Graphical superimposition can elucidatestructural similarities & differencesThe Protein Folding Problem• The Search Space• Definitions of Energy• Computing Free Energy• The Energy Function• MonteCarlo Methods• Molecular DynamicsThe Folding Problem• How does the linear a.a. sequence fold tothe 3-D shape off the ribosome?• And more broadly, how do we get the 3-Dstructure given a linear a.a. sequence?The Input Space• Linear amino acid sequence• Structure of each amino acid and peptidebackbone– Lists of atoms, bond lengths, bond angles– Ramachandran constraints on dihedral angles• The media– Water and dissolved solutes (salts)The Output Space• The 3-D coordinates of the protein in some frame• Partial Answers:– 3-D structure of active site– Location in linear sequence of secondary structure– Prediction of “class” or “family” of the proteinWhy should we care?• Sequence ---> Structure ---> Function• Structure very useful for Drug Design• Hard to get structures experimentally– X-ray crystallography (80%) 1-2 A– Nuclear Magnetic Resonance (20%) 1-3 A– Cryo Electron Microscopy (<<1%) 7-10 AHow hard is the problem?Very Hard• Huge search space• For a 100 a.a. chain, assume each a.a. can be ineither alpha, beta, or coil state (simplification)• 3100=5 * 1047 possible distinct folds• At 1 fold every 0.10 ps, it takes 1027 years• Universe is 1010 years oldWhy is the problem hard?• How do we know when we have the“correct” fold?• Need to measure interactions between a.a.’s,water, and other molecules• You are folding proteins right now• You do it in secondsSampling the Output Space• Secondary structure occurs regularly– Can form locally, independent of global structure• Steric constraints eliminate some possibilities• Maybe a nonrandom search?– Local structure can form and induce cascadesGibbs Free Energy• ∆G = ∆H - T∆S• Free Energy=Enthalpic Energy - Entropic Energy∆H = benefits of interactions (negative for folding)T∆S = costs of imposing order (negative for folding)• Proteins fold because ∆H < T∆S• Usually just by a narrow marginEntropy• High entropy means disorder• S = k ln Ω, where Ω=# arrangments• If only 1 state is allowed Ω = 1, and S=0• Often hard to compute by statisticalmechanics• Turn to a more classical


View Full Document
Download Bioinformatics of Proteins
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bioinformatics of Proteins and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bioinformatics of Proteins 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?