UMD CMSC 838T - Predicting Function: From Genes to Genomes and Back

Unformatted text preview:

Article No. mb9821J4 J. Mol. BO/. (1 998) 283, 707-725 REVIEW Predicting Function: From Genes to Genomes and Back Peer Bork', Thomas Dandekar, Yolande Diaz-Lazcoz Frank Eisenhaber, Martijn Huynen and Yanping Yuan Predicting' function from sequence using computational tools is a highly complicated procedure that is generally done for each gene individually. This review focuses on the added value that is provided by completely sequenced genomes in function prediction. Various levels of sequence annotation and function prediction are discussed, raneing from genomic sequence to that of complex cellular processes. Protern function is cur- rently best described in the context of molecular interactions. In the near future it will be possible to predict protein function in the context of higher order processes such as the regulation of gene expression, meta- bolic pathways and signalling cascades. The analysis of such higher levels of function description uses, besides the information from comple- tely sequenced genomes, also the additional information from proteomics and expression data. The final goal will be to elucidate the mapping between genotype and phenotype. 1998 Academic Press Keywords: genomes; computational tools; function prediction; 'Corresponding nutlrtk comparative genome analysis; proteomics Genomes and function prediction Prediction of protein function using compu- tational tools becomes more and more important as the gap between the increasing amount of sequences and the experimental characterization of the respective proteins widens (Bork & Koonin, 1998; Smith, 1998). With the availability of com- plete genomes we face a new quality in the predic- tion process (Table 1) as context information can be utilized when analysing particular sequences. This review focuses on the added value of genomic information on the many steps of function predic- tion from genomic sequence. The first reports on completely sequenced genomes give an excellent overview of the evolving state of the art in the ana- 1995; Fraser et nl., 1995, 1998; Himmelreich et a]., 1996; Goffeau et al., 1996; Kaneko et al., 1996 Blather ef al., 1997; Tomb et al., 1997; Kunst et al., 1997; Bult et al., 1996; Smith et al., 1997; Klenk et a]., 1997). In addition, there are numerous reviews that touch on the extraction of functional features from sequence (e.g. Bork ef al., 1994; Andrade et al., 1997; Koonin & Galparin, 1997; Bork & Koonin, 1998), but very few reviews have been published that systematically summarize the additional infor- mation for function prediction that is provided by the presence of entirely sequenced genomes (orig- inal papers e.g. by Mushegian & Koonin, 1996a,b; Himmelreich ef al., 1997; Koonin ef: al., 1997; Tatusov et al., 1996, 1997; Huynen & Bork, 1998; Huynen et al., 1997, 1998a; Dandekar et a]., 1998b). lyses of particular genomes (Fleischmann et a]., What is function? Present address: Y. Dm-Lazcoz, Laboratoire Genome "Function" is a very loosely defined term that et Infomatique; Batiment BUFFON, Universite de VersaillesSaint Quentin, 45, avenue des Etats-Unis, only makes sense in context. Most current efforts 78035 Versailles Cedex, France. aim at predicting protein function, but there are other types of function, e.g. RNA function or orga- BorkOEMBL-Heidelberg.de, DandekafiEMBL- nelle function, that also need to he explored. Even Heidelberg.de, [email protected]\rsq.fr to describe "protein function" requires a broad EisenhaberQEMBL-Heidelberg.de, HupenQEMBL- range of attributes and features (Figure 1). Molecu- Heideiberg.de, YuanBEMBL-i-Ieidelberg.de lar features such as enzymatic activity, interaction E-mail address of the corresponding author: .: 0022-2636/98/440707-19 $30.00/0 0 1993 Academic Press708 Review: Predicting Function Using Ge T* .. . Table I. Added features from complete genome analysis for function prediction &?~OIttr sprc$c pnttrnrs 111 the DNA nnd their usugr in grnonw ntmotation Feature: Genomespecific (po1y)nucleotide frequencies, codon usage Usage - Identification of genes Feature: Genornespecific signal sequences like regulatory regions, promotors Usage - Gene identification, identification of the mode of regulation of genes, reguiatory regions in mRNA, specification of the boundaries of genes \ . - Identification of recent horizontal gene hansfea into the genome . . :-?. I -, Operon identification usup qf the complrte set ofymes in u genome nnd compnrntive yenome andysis Feature: The finding of orthologs by comparative genome analysis Usage - Narrowing down the function of a gene + Identification of (conserved) regulatory signals neighbouxing the ortholopes Feature: Usage Feature: Usage -. Genes in a conserved dusters have related functions, show physical interaction - IdentScation of the functions that are absent from a genome -P If an orthologous gene is absent, but the function is present, missing genes point either to a wrong - Identification of the functions that are specific to a genome, and might be responsible for the species' -c Correlation in the pattern of occurrence of genes in the comparison of multiple genomes points to Corwrved genome organization Differential genome analysis annotation or a non-orthologous gene bansfer specific phenotype, delineation of the mapping between genotype and phenotype hcti0~1 relations between the genes Featuie: Complek list of detected gene sequences Usage + Identlfylng the optimal candidate gene in the whole genome for an observed enzymatic activity for function prediction at "lower levels", e.g. in the prediction of the function of single genes. Various types of pattern and (context) information that become available with the analysis of the complete genome can be used partners, and pathway context are currently being predicted, but only qualitatively. Expression pat- terns, . regulation, kinetic properties, localization 'and concentration effects and, even more so, dys- functions, environmental influence, fitness contri- bution or dinical symptoms can currently hardly be predicted. There is furthennore a relatively poor knowledge of the mechanisms of posttranslational modifications (Esko & Zhang, 1996). For example, although some sequence patterns for preferred gly- colysation sites are known, the prediction accuracy


View Full Document

UMD CMSC 838T - Predicting Function: From Genes to Genomes and Back

Documents in this Course
Load more
Download Predicting Function: From Genes to Genomes and Back
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Predicting Function: From Genes to Genomes and Back and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Predicting Function: From Genes to Genomes and Back 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?