DOC PREVIEW
CORNELL CS 726 - Study Notes

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Using surface envelopes fordiscrimination of molecular modelsJONATHAN M. DUGAN AND RUSS B. ALTMANDepartment of Genetics, Informatics Laboratory, Stanford University, Stanford, California 94305, USA(RECEIVED August 20, 2003; FINAL REVISION August 20, 2003; ACCEPTED September 29, 2003)AbstractShape information about macromolecules is increasingly available but is difficult to use in modeling efforts.We demonstrate that shape information alone can often distinguish structural models of biological macro-molecules. By using a data structure called a surface envelope (SE) to represent the shape of the molecule,we propose a method that generates a fitness score for the shape of a particular molecular model. This scorecorrelates well with root mean squared deviation (RMSD) of the model to the known test structures and canbe used to filter models in decoy sets. The scoring method requires both alignment of the model to the SEin three-dimensional space and assessment of the degree to which atoms in the model fill the SE. Alignmentcombines a hybrid algorithm using principal components and a previously published iterated closest pointalgorithm. We test our method against models generated from random atom perturbation from crystalstructures, published decoy sets used in structure prediction, and models created from the trajectories ofatoms in molecular modeling runs. We also test our alignment algorithm against experimental electronmicroscopic data from rice dwarf virus. The alignment performance is reliable, and we show a highcorrelation between model RMSD and score function. This correlation is stronger for molecular models withgreater oblong character (as measured by the ratio of largest to smallest principal component).Keywords: surface; fitness function; shape; molecular modeling; principle componentsThe structures of biological macromolecules provide usefulinformation in a variety of research efforts. For example,protein and nucleic acid structures help us understand basicbiological and molecular interactions. Similarly, diseasemechanisms are better understood when an atomic-level de-scription of their pathologies can be explained. Better drugsand interventions are possible when the molecular andatomic structures involved are known. The gold standardtechnique for measuring the structure of bimolecules is X-ray crystallography (Branden and Tooze 1999). Unfortu-nately, the number of possible structures in nature that are ofinterest for biology and medicine vastly surpass the numberof solved structures. Although the rate of experimentalstructure determination has increased significantly in recentyears due to both academic and industrial efforts, the gapbetween solved and desired structures will remain for manyyears. Therefore, molecular modeling of structures based onincomplete structural information will remain important.One useful type of structural data is information about theshape of the molecule. A variety of experimental and com-putational techniques provides incremental informationabout the expected shape of a biological macromolecule,including electron microscopy (EM; Frank 1996), sedimen-tation experiments (Urbanke and Ziegler 1980), homologymodeling (Sali and Blundell 1993; Simons et al. 1999), andsmall-angle scattering experiments (Kaiushina et al. 1985).The most common of these methods is EM, which candirectly visualize molecular structures of significant size,such as protein complexes. However, the data resolution ofEM is currently 7 to 9 Å (Chiu et al. 2002), whereas crys-tallography and NMR are in the 2 to 4 Å range. Thus,assignment of individual atom positions is very challengingusing EM alone.Reprint requests to: Russ B. Altman, Department of Genetics, Informat-ics Laboratory, 300 Pasteur Drive, Room L329, Stanford University, Stan-ford, CA 94305, USA; e-mail: [email protected]; fax: (650) 725-7944.Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.03385504.Protein Science (2004), 13:15–24. Published by Cold Spring Harbor Laboratory Press. Copyright © 2004 The Protein Society15We propose and test a computational method that appliesshape information as a discrimination metric in the evalua-tion of structural models. If a model has more shape agree-ment with measured shape information, that model is morelikely to be correct than are models with less agreement. Toour knowledge, shape information has not been routinelyand automatically used in assessing model fitness. Anotherpotential use for a general shape scoring system (not dem-onstrated in this work) would be within the context of build-ing novel molecular models using shape information.To take advantage of the data sources that contributeshape information, we developed a unified, linear datastructure to encode shape information called the surfaceenvelope (SE). An SE is any three-dimensional data struc-ture that assigns a number between zero and one to eachpoint in space corresponding linearly to the amount of elec-tron density observed at that point. These numbers arecalled density values. In practice, assigning density valuesto every point in real space is computationally expensive, soour SE implementation assigns a regular cubic grid overthree-dimensional space, and associates one density value toeach grid point. The region around each grid point is calleda box and has the shape of a cube. We assign all pointswithin each box to have the same density value as the valueassociated with the central grid point. Boxes contiguouslyspan three-dimensional space in each direction. Figure 1Ashows an example of an SE.We use the phrase “surface envelope” to avoid confusionwith two closely related concepts: the molecular surface andsurface accessibility. Molecular surfaces are created bythresholding electron density data and defining the bound-ary between the inside and outside of a particular molecule.This is a two-dimensional data structure embedded in threedimensions, whereas the SE is a three-dimensional datastructure. Surface accessibility relates how close an atom orresidue sits to the molecular surface (Schmidt et al. 1998).This is a one-dimensional measurement.Using shape information to assess the quality of a struc-tural model is a complex task for a variety of reasons. Thereare two parts to the problem of creating a function capableof scoring model-envelope matches: (1) aligning the modelstructure to the SE and (2) generating a score from thealignment. An alignment of two


View Full Document

CORNELL CS 726 - Study Notes

Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?