UT CH 395 - Bayesian Probabilistic Approach for Predicting Backbone Structures in Terms of Protein Blocks

Unformatted text preview:

INTRODUCTIONMATERIALS AND METHODSFig. 1.RESULTSFig. 2.TABLE I.Fig. 3.Fig. 4.TABLE II.Fig. 5.Fig. 6.TABLE III.Fig. 7.Fig. 8.DISCUSSIONREFERENCESBayesian Probabilistic Approach for Predicting BackboneStructures in Terms of Protein BlocksA. G. de Brevern,1*C. Etchebest,1,2and S. Hazout11Equipe de Bioinformatique Ge´nomique et Mole´culaire, INSERM U436, Universite´ Paris 7, Paris, France2Laboratoire de Biochimie The´orique, UPR 9080 CNRS, Institut de Biologie Physico-Chimique, Paris, FranceABSTRACT By using an unsupervised clusteranalyzer, we have identified a local structural alpha-bet composed of 16 folding patterns of five consecu-tive C␣(“protein blocks”). The dependence thatexists between successive blocks is explicitly takeninto account. A Bayesian approach based on therelation protein block-amino acid propensity is usedfor prediction and leads to a success rate close to35%. Sharing sequence windows associated withcertain blocks into “sequence families” improvesthe prediction accuracy by 6%. This prediction accu-racy exceeds 75% when keeping the first four pre-dicted protein blocks at each site of the protein. Inaddition, two different strategies are proposed: thefirst one defines the number of protein blocks ineach site needed for respecting a user-fixed predic-tion accuracy, and alternatively, the second onedefines the different protein sites to be predictedwith a user-fixed number of blocks and a chosenaccuracy. This last strategy applied to the ubiquitinconjugating enzyme (␣/␤ protein) shows that 91% ofthe sites may be predicted with a prediction accu-racy larger than 77% considering only three blocksper site. The prediction strategies proposed im-prove our knowledge about sequence-structure de-pendence and should be very useful in ab initioprotein modelling. Proteins 2000;41:271–287.© 2000 Wiley-Liss, Inc.Key words: protein backbone structure; unsuper-vised classifier; structure-sequence rela-tionships; structure prediction; proteinblock; Bayesian approach; predictionstrategiesINTRODUCTIONThe protein sequence contains the whole information ofthe protein three-dimensional (3D) structure. Proteinscannot fold into unlimited number of structural motifs.1,2Yet our lack of understanding of the physicochemical andkinetic factors involved in folding prevent us from advanc-ing from knowledge of the primary sequence to reliablepredictions of the biologically active 3D structure. The firstlevel of the protein structure is the secondary structurecharacterized in terms of ␣-helix, ␤-strand, and unrepeti-tive coil. A thousand different prediction algorithms havebeen developed, e.g., statistical methods like the pioneerGOR3,4or neural networks like the well-known PHD5andthe more recent work of Chandonia and Karplus.6,7Theaccuracy of these works were strongly increased with theaddition of the multiple sequences alignment in the neuralnetworks,8probabilistic approach,9or computational infor-mative encoding.10The increase in the entries in thebiologic databases may permit an increase in the predic-tion rate.11Concerning the 3D structure, the ab initio proteinfolding algorithms, using only energetic or physicochemi-cal parameters, were limited to small proteins.12–14Numer-ous studies describe the ab initio modeling of a 3Dstructure from the sole knowledge of its primary structure.However, due to actual weakness of the prediction rate,this determination is still an open field.The results obtained in the recent CASP III meeting arethe best witnesses of such tentative findings.15The compat-ibility of the sequence versus known structures is analternative approach to find the best approximation of theprotein fold.16,17Most of the methods for finding thefolding state of a protein are mainly based on the use of the3D structure of homologous proteins combined with simpli-fied spatial restraints, statistical analysis, and physico-chemical constraints.18,19Recently, the use of fragment library20more detailedthan three-states and based on the most frequent localstructural motifs (in terms of polypeptide backbone) en-countered in the ensemble of 3D structure protein data-base had led to improved results21,22within a knowledge-based ab initio method.23Clearly, the main difficulty to overcome resides alongthe pathway going from the secondary structure predictionto the tertiary structure prediction. In this spirit, the studyof the local conformations of proteins had a long historyprincipally based on the study of the classic repetitivestructures. We can notice interesting works such as thosebased on the geometric and sequential characterization of␣-helices24or discrimination between the different types of␤-turns.25Most algorithms that described global conforma-tions of the proteins used this simple structural alpha-bet.26–28Recently, with the constant augmentation of theProtein Data Bank, automatic researches designed to*Correspondence to: Alexander de Brevern, Equipe de Bioinforma-tique Ge´nomique et Mole´culaire, INSERM U436, Universite´ Paris 7,case 7113, 2, place Jussieu, 75251 Paris cedex 05, France. E-mail:[email protected] 6 January 2000; Accepted 12 June 2000PROTEINS: Structure, Function, and Genetics 41:271–287 (2000)© 2000 WILEY-LISS, INC.determine families of specific coils have been carriedout.29–31Among the different works concerning the definition of astructural alphabet (the consensus structural patternswill be labelled protein blocks, or PBs), two main types oflibraries of PBs can be distinguished: those composed of ahigh number (around 100) of protein blocks for describingprotein structures or those characterized by a limitednumber of fold prototypes (4 to 13).In the first type, the use of small blocks (fragments of sixamino acids) for rebuilding a protein structure had begunwith the work of Unger et al.32using the RMSD (root meansquare deviation) as criterion. The authors have identifiedabout 100 building blocks that could replace about 76% ofall hexamers with an error of less than 1 Å. Schuchhardt etal.33similarly obtained a library of 100 structural motifsby an unsupervised learning algorithm from the series ofdihedral angles. These libraries are adequate for approxi-mating a 3D protein structure; however, they are noteasily usable for prediction.In the second type of approaches, Rooman et al. definerecurrent folding motifs by a clustering algorithm usingthe RMSD on distances between selected backbone at-oms.34They described 16


View Full Document

UT CH 395 - Bayesian Probabilistic Approach for Predicting Backbone Structures in Terms of Protein Blocks

Download Bayesian Probabilistic Approach for Predicting Backbone Structures in Terms of Protein Blocks
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayesian Probabilistic Approach for Predicting Backbone Structures in Terms of Protein Blocks and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayesian Probabilistic Approach for Predicting Backbone Structures in Terms of Protein Blocks 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?