Unformatted text preview:

Using phylogenetic profiles to predict protein function and localizationPapersBasic Idea:Slide 4Phylogenetic ProfileFunctional LinkWhat They Did:ConclusionsEvolutionary Origin of Eukaryotic CellSlide 10EvidencePhylogenetic profilesCalculating phylogenetic profilesThree CategoriesLinear Discriminant FunctionsTesting AlgorithmPredictionVerificationsSlide 19Using phylogenetic profiles to predict protein function and localizationAs discussed by Catherine GrassoPapersPellegrini, et al. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. (1999) PNAS 96, 4285-4288.Marcotte, et al. Localizing proteins in the cell from their phylogenetic profiles. (2000) PNAS 97, 12115-12120.Basic Idea:Sequence alignment is a good way to infer protein function, when two proteins do the exact same thing in two different organisms. Proteins with > 30% sequence identity have the same fold, and typically the same function.Basic Idea:But can we decide if two proteins function in the same pathway, such as histidine biosynthesis, or the same biomolecular structure, such as the flagella or ribosome, even if they don’t do the exact same thing?Yes. Assume that if the two proteins function together they must evolve in a correlated fashion: so every organism that has a homolog of one of the proteins must also have a homolog of the other protein.Phylogenetic ProfileFor a given protein, BLAST against N sequenced genomes.Construct a vector with N coordinates.If protein has a homolog in the organism n, set coordinate n to 1. Otherwise set it to 0.Protein P1: 0 0 1 0 1 1 0 0Functional LinkAssign a degree of functional linkage between P1 and P2 based on the number of positions (or bits) at which their profiles differ. Protein P2: 0 1 1 0 1 1 0 0Protein P1: 0 0 1 0 1 1 0 0What They Did:Computed phylogenetic profiles for 4,290 proteins in E. Coli.Aligned each protein sequence Pi with the proteins from 16 other fully sequenced genomes.Proteins coded by genome n are defined as including a homolog of Pi if they align to Pi with a score that is deemed statistically significant.ConclusionsComparing profiles is useful tool for identifying the complex or pathway in which a protein participates.As the number of fully sequenced genomes increases scientists will be able to construct longer more informative profiles.In 1999, 100 more genomes were due to be completed in next few months. Suggests that as eukaryotic genomes come out profiles will be a useful tool for studying pathways in higher organisms.Evolutionary Origin of Eukaryotic CellMitochondria, chloroplasts and perhaps other organelles descended from microbes captured by progenitors of eukaryotic cells. You exist because of a bad case of indigestion!Evolutionary Origin of Eukaryotic CellThis endosymbiosis was stabilized by shifting of genes of organelle into nuclear genome and transport systems being established to shuttle organellar proteins form cytoplasm into organelles.Contemporary mitochondrial genome encode only a few genes (<20), primarily large integral membrane proteins which can’t be transported.EvidenceProteins of these organelles have molecular properties resembling prokaryotic rather than eukaryotic proteins: 1.Average lengths2.Domain composition3.Amino acid composition4.Homologs among prokaryotesPhylogenetic profilesWill show that proteins with similar phylogenetic profiles localize to similar subcellular locations.Actually, will primarily show this for the mitochondria.Calculating phylogenetic profilesIn this study, the value at each position of the profile is equal to -1/log E, where E is the BLAST expectation value of best matching protein in a genome.Calculated only for E < 1x10-6 and 1.0 otherwise. So zero is a perfect match and one is no match.Three CategoriesProkaryote Derived: Only has homologs in prokaryotes.Eukaryote Derived: Only has homologs in eukaryotes.Organism Specific: Has no homologs.Why split these categories? Should have different functions and roles in mitochondria.Linear Discriminant FunctionsMP Non-MPtVarying t increasesprediction accuracyat the expense of coverage.Testing AlgorithmFirst, predicted the location of yeast proteins of known location (open diamonds).Second, a jackknife test was performed. Repeated 100 times with different random sets (filled diamonds). Coverage 58% at 50% accuracy.Third, used yeast proteins as training set and worm proteins as test set. Coverage 65% at 50% accuracy.PredictionApplied algorithm to all yeast proteins. Estimate ~630 total mitochondrion-targeted genes in yeast or 10% of genome.Applied algorithm to all worm proteins. Estimate ~660 total mitochondrion-targeted genes in worms of 4% of genome.VerificationsTested whether functions of newly predicted mitochondrial proteins matched functions of known mitochondrial protein better than the functions of a random set of proteins. (Jacard Coefficient, Pie Charts)Fraction of predicted mitochondrial proteins with predicted transmembrane segments or signal peptides.2D gel of whole rat liver and human placental mitochondria reveals ~250-350 visible proteins.ConclusionsThere is information in the phylogenetic profiles, but it is quite noisy.Yields approximate numbers of genes migrated to the nuclear genomes from the mitochondria.Gives even more evidence for endosymbiotic theory.However, verifications did not confirm results as much as one might like.Perhaps fundamental assumption


View Full Document

CORNELL CS 726 - Study Notes

Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?