Mizzou INFOINST 8010 - Protein Function Prediction (66 pages)

Previewing pages 1, 2, 3, 4, 31, 32, 33, 34, 35, 63, 64, 65, 66 of 66 page document View the full content.
View Full Document

Protein Function Prediction



Previewing pages 1, 2, 3, 4, 31, 32, 33, 34, 35, 63, 64, 65, 66 of actual document.

View the full content.
View Full Document
View Full Document

Protein Function Prediction

69 views


Pages:
66
School:
University of Missouri
Course:
Infoinst 8010 - Problem Solving in Bioinformatics
Unformatted text preview:

Protein Function Prediction Jianlin Cheng PhD Department of Computer Science Informatics Institute 2009 References Slides and documents at www geneontology org R Rentzsch and C A Orengo Trends Biotechnology 2009 Widely Used Systems for Protein Function Definition Enzyme Commission EC Transporter Classification TC Riley scheme assign prokaryotic gene products to cellular processes The MIPS Functional Catalogue FUNCAT extension of Riley to all three kinds of life Kyoto Enclyclopedia of Genes and Genomes KEGG Gene Ontology GO molecular function biological process and cellular component Gene Ontology widely adopted AgBase www geneontology org Gene Ontology Biological process ontology Which process is a gene product involved in Molecular function ontology Which molecular function does a gene product have Cellular component ontology Where does a gene product act Where is it Mitochondrial p450 GO cellular component term GO 0005743 What does it do substrate O2 CO2 H20 product monooxygenase activity GO molecular function term GO 0004497 Which process is this electron transport http ntri tamuk edu cell mitochondrion krebpic html GO biological process term GO 0006118 Molecular function ontology Nucleic acid binding is a type of binding is a is a DNA binding is a type of nucleic acid binding Biological process ontology Adaxial abaxial pattern formation is a type of pattern specification is a is a part of Adaxial abaxial pattern specification is a part of adaxial abaxial pattern formation Cellular component ontology nucleus is part of the intracellular domain is a membranebound organelle is a type of organelle part of Categorizing gene products is called annotation process function component The gene product inner no outer is involved in adaxial abaxial axis specification process function component The gene product inner no outer has transcription factor activity process function component The gene product inner no outer is active in the nucleus Clark et al 2005 is a part of Fun Biological Process courtship behavior The Gene Ontology is like a dictionary Each concept has term transcription initiation id GO 0006352 a name a definition an ID number Parent nodes definition Processes involved in the assembly of the RNA polymerase complex at the promoter region of a DNA template resulting in the subsequent synthesis of RNA from that promoter Parent nodes GO 0002221 is a Clark et al 2005 is a part of Current State of Function of Model Genome Annotation Sharan et al Molecular Systems Biology 2007 Whole genome analysis J D Munkvold et al 2004 analysis of high throughput data according to GO MicroArray data analysis time Defense response Immune response Response to stimulus Toll regulated genes JAK STAT regulated genes Puparial adhesion Molting cycle hemocyanin Amino acid catabolism Lipid metobolism Peptidase activity Protein catabloism Immune response Immune response Toll regulated genes attacked control cted Gene Tree pearson Colo redby by pearson lw n3d lw n3d Colo red nch classificatio n Set LW n3d 5p Gene List color Set LW n3d 5p Gene List Bregje Wertheim at the Centre for Evolutionary Genomics Department of Biology UCL and Eugene Schuster Group EBI Copy o f Copy C5 RMA Copy o fofCopy of Defa C5 RMA Defa allall genes 14010 14010 genes Simple Function Prediction The easiest way to infer the molecular function of an uncharacterized sequence is by finding an obvious highly sequence similar and well characterized homologue BLAST sequence sequence local alignment tool PSI BLAST profile sequence local alignment tool Problem many proteins do not have obvious homologs Integrative Approaches Similarity grouping Phylogenomics Sequence patterns Sequence clustering Machine learning Network approach Results at least coarse functional characterization Similarity Group Methods Idea Similarly the sequences found in a similarity search will usually share some annotated functions some GO terms will be significantly enriched over others PFP Method Sequence hit retrieved by a PSI BLAST search Associated GO terms are scored according to the alignment expectation value E value provided by PSI BLAST The scores for terms associated to several sequence hits are combined by summation This scoring system ranks GO terms according to both 1 their frequency of association to similar sequences and 2 the degree of similarity those sequences share with the query A GO term fa is scored as follows Hawkins et al Proteins 2008 where s fa is the final score assigned to the GO term fa N is the number of the similar sequences retrieved by PSI BLAST Nfunc i is the number of GO terms assigned to sequence j Evalue i is the E value given to the sequence i and fj is a GO term assigned to the sequence i delta fj fa returns 1 when fj equals to fa and 0 otherwise E value threshold is set to 125 Function Association Matrix The Function Association Matrix describes the probability that two GO terms are associated to the same sequence based on the frequency at which they co occur in UniProt sequences This allows the FAM to associate function annotations from different GO categories for example the biological process positive regulation of transcription DNAdependent is strongly associated with the molecular function DNA binding activity P 0045893 0003677 0 455 Phylogenomic Apporach The accuracy of annotation transfer can be increased further by taking the evolutionary relationships within protein families into account This addresses the difference between orthologous and paralogous relative of a query sequence i e between relatives by speciation and relatives by gene duplication A duplication event captures a single instance of a gene duplicating into divergent copies of that gene within a single genome a speciation event captures a single instance of a gene in an ancestral species evolving into divergent copies of a gene in distinct genomes of different species Which event more likely preserves function Steps Find all homologues of the query sequence and align them Build a phylogenetic tree and reconcile this tree make all bifurcations in the tree as either duplication or speciation Transfer functions primarily from orthologues Zmasek and Eddy Bioinformatics 2001 SIFTER 1 Given a query protein we find a Pfam family of a homologous domain and extract the multiple sequence alignment from the Pfam database 2 Build a rooted phylogenetic tree with PAUP version using parsimony with the BLOSUM50 matrix 3 Apply Forester version 1 92 to estimate the


View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Protein Function Prediction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Protein Function Prediction and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?