Using phylogenetic profiles to predict protein function and localizationPapersBasic Idea:Slide 4Phylogenetic ProfileFunctional LinkWhat They Did:ConclusionsEvolutionary Origin of Eukaryotic CellSlide 10EvidencePhylogenetic profilesCalculating phylogenetic profilesThree CategoriesLinear Discriminant FunctionsTesting AlgorithmPredictionVerificationsSlide 19Using phylogenetic profiles to predict protein function and localizationAs discussed by Catherine GrassoPapersPellegrini, et al. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. (1999) PNAS 96, 4285-4288.Marcotte, et al. Localizing proteins in the cell from their phylogenetic profiles. (2000) PNAS 97, 12115-12120.Basic Idea:Sequence alignment is a good way to infer protein function, when two proteins do the exact same thing in two different organisms. Proteins with > 30% sequence identity have the same fold, and typically the same function.Basic Idea:But can we decide if two proteins function in the same pathway, such as histidine biosynthesis, or the same biomolecular structure, such as the flagella or ribosome, even if they don’t do the exact same thing?Yes. Assume that if the two proteins function together they must evolve in a correlated fashion: so every organism that has a homolog of one of the proteins must also have a homolog of the other protein.Phylogenetic ProfileFor a given protein, BLAST against N sequenced genomes.Construct a vector with N coordinates.If protein has a homolog in the organism n, set coordinate n to 1. Otherwise set it to 0.Protein P1: 0 0 1 0 1 1 0 0Functional LinkAssign a degree of functional linkage between P1 and P2 based on the number of positions (or bits) at which their profiles differ. Protein P2: 0 1 1 0 1 1 0 0Protein P1: 0 0 1 0 1 1 0 0What They Did:Computed phylogenetic profiles for 4,290 proteins in E. Coli.Aligned each protein sequence Pi with the proteins from 16 other fully sequenced genomes.Proteins coded by genome n are defined as including a homolog of Pi if they align to Pi with a score that is deemed statistically significant.ConclusionsComparing profiles is useful tool for identifying the complex or pathway in which a protein participates.As the number of fully sequenced genomes increases scientists will be able to construct longer more informative profiles.In 1999, 100 more genomes were due to be completed in next few months. Suggests that as eukaryotic genomes come out profiles will be a useful tool for studying pathways in higher organisms.Evolutionary Origin of Eukaryotic CellMitochondria, chloroplasts and perhaps other organelles descended from microbes captured by progenitors of eukaryotic cells. You exist because of a bad case of indigestion!Evolutionary Origin of Eukaryotic CellThis endosymbiosis was stabilized by shifting of genes of organelle into nuclear genome and transport systems being established to shuttle organellar proteins form cytoplasm into organelles.Contemporary mitochondrial genome encode only a few genes (<20), primarily large integral membrane proteins which can’t be transported.EvidenceProteins of these organelles have molecular properties resembling prokaryotic rather than eukaryotic proteins: 1.Average lengths2.Domain composition3.Amino acid composition4.Homologs among prokaryotesPhylogenetic profilesWill show that proteins with similar phylogenetic profiles localize to similar subcellular locations.Actually, will primarily show this for the mitochondria.Calculating phylogenetic profilesIn this study, the value at each position of the profile is equal to -1/log E, where E is the BLAST expectation value of best matching protein in a genome.Calculated only for E < 1x10-6 and 1.0 otherwise. So zero is a perfect match and one is no match.Three CategoriesProkaryote Derived: Only has homologs in prokaryotes.Eukaryote Derived: Only has homologs in eukaryotes.Organism Specific: Has no homologs.Why split these categories? Should have different functions and roles in mitochondria.Linear Discriminant FunctionsMP Non-MPtVarying t increasesprediction accuracyat the expense of coverage.Testing AlgorithmFirst, predicted the location of yeast proteins of known location (open diamonds).Second, a jackknife test was performed. Repeated 100 times with different random sets (filled diamonds). Coverage 58% at 50% accuracy.Third, used yeast proteins as training set and worm proteins as test set. Coverage 65% at 50% accuracy.PredictionApplied algorithm to all yeast proteins. Estimate ~630 total mitochondrion-targeted genes in yeast or 10% of genome.Applied algorithm to all worm proteins. Estimate ~660 total mitochondrion-targeted genes in worms of 4% of genome.VerificationsTested whether functions of newly predicted mitochondrial proteins matched functions of known mitochondrial protein better than the functions of a random set of proteins. (Jacard Coefficient, Pie Charts)Fraction of predicted mitochondrial proteins with predicted transmembrane segments or signal peptides.2D gel of whole rat liver and human placental mitochondria reveals ~250-350 visible proteins.ConclusionsThere is information in the phylogenetic profiles, but it is quite noisy.Yields approximate numbers of genes migrated to the nuclear genomes from the mitochondria.Gives even more evidence for endosymbiotic theory.However, verifications did not confirm results as much as one might like.Perhaps fundamental assumption
View Full Document