Mizzou INFOINST 8010 - Protein Function Prediction

Unformatted text preview:

Protein Function PredictionJianlin Cheng, PhDDepartment of Computer ScienceInformatics Institute2009References• Slides and documents at: www.geneontology.org• R. Rentzsch and C.A. Orengo, Trends Biotechnology, 2009.Widely Used Systems for Protein Function Definition• Enzyme Commission (EC), Transporter Classification (TC)• Riley scheme: assign prokaryotic gene products to cellular processes• The MIPS Functional Catalogue (FUNCAT): extension of Riley to all three kinds of life• Kyoto Enclyclopedia of Genes and Genomes (KEGG)• Gene Ontology (GO): molecular function, biological process, and cellular component.www.geneontology.orgGene Ontology widely adopted AgBase• Biological process ontologyWhich process is a gene product involved in?• Molecular function ontologyWhich molecular function does a gene product have?• Cellular component ontologyWhere does a gene product act?Gene OntologyGO cellular component term:GO:0005743Where is it?Mitochondrialp450GO molecular function term:GO:0004497What does it do?substrate + O2= CO2+H20 productmonooxygenase activityhttp://ntri.tamuk.edu/cell/mitochondrion/krebpic.htmlGO biological process term:GO:0006118Which process is this?electron transportis_ais_aDNA binding is a type of nucleic acid binding.Nucleic acid binding is atype of binding. Molecular function ontologyBiological process ontologyis_ais_apart_ofAdaxial/abaxial patternformation is a type ofpattern specification.Adaxial/abaxial patternspecification is a part of adaxial/abaxial patternformation.part_ofis_anucleus is partof the intracellulardomainmembrane-boundorganelle is atype of organelleCellular component ontologyprocessfunctioncomponentThe gene product inner no outer is involved in adaxial/abaxial axis specification.Categorizing gene products is called ‘annotation’.processfunctioncomponentThe gene product inner no outerhas transcription factor activity.processfunctioncomponentThe gene product inner no outeris active in the nucleus.Clark et al., 2005part_ofis_aFun: Biological Processcourtship behaviorThe Gene Ontologyis like a dictionary• a nameterm: transcription initiationdefinition: Processes involved in the assembly of the RNA polymerase complex at the promoter region of a DNA template resulting in the subsequent synthesis of RNA from that promoter.Parent nodes: GO:0002221, is-a• a definitionid: GO:0006352• an ID number•Parent nodesEachconcept has:Clark et al., 2005part_ofis_aCurrent State of Function of Model Genome AnnotationSharan et al., Molecular Systems Biology, 2007Whole genome analysis(J. D. Munkvold et al., 2004)Selected Gene Tree: pearson lw n3d ...Branch color classification: Set_LW_n3d_5p_...Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)attackedtimecontrolPuparial adhesionMolting cyclehemocyaninDefense responseImmune responseResponse to stimulusToll regulated genesJAK-STAT regulated genesImmune responseToll regulated genesAmino acid catabolismLipid metobolismPeptidase activityProtein catabloismImmune responseSelected Gene Tree: pearson lw n3d ...Branch color classification: Set_LW_n3d_5p_...Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI.…analysis of high-throughput data according to GOMicroArray data analysisSimple Function Prediction• The easiest way to infer the molecular function of an uncharacterized sequence is by finding an obvious (highly sequence-similar) and well-characterized homologue.• BLAST (sequence-sequence local alignment tool)• PSI-BLAST (profile-sequence local alignment tool)• Problem: many proteins do not have obvious homologsIntegrative Approaches• Similarity grouping• Phylogenomics• Sequence patterns• Sequence clustering• Machine learning• Network approach• Results: at least coarse functional characterizationSimilarity Group MethodsIdea: Similarly, the sequences found in a similarity search will usually share some annotated functions – some GO terms will be significantly enriched over othersPFP Method• Sequence hit retrieved by a PSI-BLAST search• Associated GO terms are scored according to the alignment expectation value (E-value) provided by PSI-BLAST. • The scores for terms associated to several sequence hits are combined by summation. This scoring system ranks GO terms according to both (1) their frequency of association to similar sequences and (2) the degree of similarity those sequences share with the query. • A GO term, fa, is scored as follows:Hawkins et al., Proteins, 2008• where s(fa) is the final score assigned to the GO term, fa; N is the number of the similar sequences retrieved by PSI-BLAST, Nfunc(i) is the number of GO terms assigned to sequence j, Evalue(i) is the E-value given to the sequence i, and fj is a GO term assigned to the sequence i. delta(fj, fa) returns 1 when fjequals to fa, and 0 otherwise. • E-value threshold is set to 125.Function Association MatrixThe Function Association Matrix, describes the probability that two GO terms are associated to the same sequence based on the frequency at which they co-occur in UniProt sequences. This allows the FAM to associate function annotations from different GO categories, for example, the biological process ‘‘positive regulation of transcription, DNA-dependent’’ is strongly associated with the molecular function ‘‘DNA binding activity’’ (P(0045893|0003677) = 0.455).Phylogenomic Apporach• The accuracy of annotation transfer can be increased further by taking the evolutionary relationships within protein families into account. • This addresses the difference between orthologous and paralogous relative of a query sequence (i.e. between relatives by speciation and relatives by gene duplication)• A ‘‘duplication event’’ captures a single instance of a gene duplicating into divergent copies of that gene within a single genome; • a ‘‘speciation event’’ captures a single instance of a gene in an ancestral species evolving into divergent copies of a gene in distinct genomes of different species.Which event more likely preserves function?Steps• Find all homologues of the query sequence and align them• Build a phylogenetic tree and reconcile this tree (make all bifurcations in the tree as either duplication or speciation)• Transfer functions (primarily) from orthologuesZmasek and Eddy, Bioinformatics, 2001SIFTER1.


View Full Document

Mizzou INFOINST 8010 - Protein Function Prediction

Download Protein Function Prediction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Protein Function Prediction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Protein Function Prediction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?