Networks of Protein Interactions Introduction and IntegrationOverviewCoexpressionCoinheritanceColocationCoevolutionFunctional GenomicsIntegration MotivationHow to use 2 predictors?Early Integration HacksSlide 11Slide 12Recent workSlide 14Slide 15RecapTraining SetSlide 18Training SetsSlide 20Slide 21Bayes’ Rule in 1D2D Network IntegrationSlide 24Slide 25Slide 262D Network IntegrationHidden BiologyRecap #2Using N predictorsBinary Classifier ParadigmBlessing of DimensionalityClassifier builds networkMreBCtrA and CcrMC. jejuni glycosylationNets speed experimentSlide 38Further DirectionsNetworks of Protein InteractionsIntroduction and IntegrationBalaji S. SrinivasanCS 374Lecture 5 10/11/2005OverviewGenomics1 genomeAssembly, Gene FindingComparative GenomicsN genomesSequence AlignmentFunctional Genomics1 assayMicroarray AnalysisIntegrative GenomicsN assaysNetwork Integration (this talk)Coexpression1.811-.6-.7Gene AGene BGene CGene BGene AGene CPearson Correlation=.8-.7-.6ExpressionGenesArraysMicroarray dataCoinheritance1.9511-.95-1Protein AProtein BProtein CProtein BProtein AProtein C.95-1-.95=Spearman Correlation600200300100500100300200400250 250 50Protein AProtein BProtein CSpecies 2Species 1Species 4Species 3InheritanceBLAST bit scoresColocation0.0600.25.25Protein AProtein BProtein CProtein BProtein AProtein CAverage chromosomal distance.06.25.25=.6.2.3.1.5.1.3.2.4.25 .25 .05Protein AProtein BProtein CChrom 2Chrom 1Chrom 4Chrom 3LocationAssembled GenomesCoevolution1.911-.7-.8Prt Fam APrt Fam BPrt Fam CPrt Fam BPrt Fam APrt Fam CTree Distances.9-.8-.7=C’’EvolutionA A’ A’’ A’’’B’ B’’ B’’’BC’ C’’’CMultiple AlignmentsFunctional GenomicsMany others…ExperimentalTAP + Mass SpecY2HPheno & antibody arraysSynthetical lethalRNAi knockdownComputationalRosetta Stone (conserved domain)Shared OperonPSIMAPExperimentalComputationalIntegration MotivationCan we combine data?Example: Caulobacter crescentus flagellar proteinsCoexpression clusterCompare to coinheritancePotential for integration…CoexpressionCoinheritanceHow to use 2 predictors?Agree & disagree…Scales, noise levels, sources, very differentCan we do network integration ?coinheritancecoexpression≠Early Integration HacksGiven 2 netsintersectionunionaverage weights+€ G1= (V1, E1)coexpression€ V1,V2∈ (V, set of all proteins)coinheritance€ G2= (V2, E2)= € Eisc=1 if (E1> T1) || (E2> T2)Eunion=1 if (E1> T1) & & (E2> T2)Eavg= .5(E1+E2)Early Integration Hacks.9.8.7.6Coexpression.5.7.8.9Coinheritance+=IntersectionToo strict Too lenient Too dumb :)Union.65.35.45.75Average.35.4Early Integration HacksUseful dumb…All data equal?No explicit, statistical formulationdiff noise levelsdiff intervalsUninformed by prior data….65.35.85.75Average.35Too dumbRecent workBayesian Networks (Troyanskaya 2003)Decision Trees (Wong 2004)Naïve Bayes + Boosting (Lu 2005)Likelihood Ratios (Lee 2004)Recent workMajor innovation: Training SetMIPS, “Gold Standard” (Gerstein) SSL, synthetic lethals (Wong)DIP (Marcotte)Defines the signalWhat is our algorithm learning?KEGG (Pyrimidine Metabolism)Recent workMajor limitationsMethod specificDecision treesbinary codingBayesian Networksneed to poll people for priorAll methods Biological: limited to yeastStatisticaldependency hacks!Lee: heuristic weightingNaïve BayesNaïve Bayes (Lu 2005)Heuristic Weighting (Lee 2004)RecapJust shownFunctional GenomicsIntegration ProblemPrevious workall in S. cerevisiaemajor innovation: training setmajor shortcoming: dependence hacksTo cometraining set, common scalerigorous statistical dependencemicrobes only (for now…)+ + + …coexpression coinheritance colocation…Training SetObservationKnown linkages for nontrivial fraction of pairsCaulobacter crescentusKEGG: 783 of 3737 proteins in 1 or more KEGG pathwaysEx: pyrimidine metabolism, pathway 240Training SetTabulate pairs1 if shared COG/KEGG/GO0 if unshared? If one or both unknownMost pairs totally unknown…Training SetsMost pairs totally unknown…Caulobacter crescentus3737 proteins, 783 KEGGsmall in relative termslarge in absolute terms6667480 pairs6980716 pairs € =3737C2All pairs: L=0,1,?298961 pairs+14275 pairs+043.237372783=CCrelative frequency: training pairs vs. all pairsTraining Sets6667480 pairs298961 pairs14275 pairsAll pairs: L=0,1,?6980716 pairsTraining SetsTraining data is crucialReveals hidden structureSmall effort yields large payoffL=0,1,? statsPuts data on common scalemeter in biology (predictive power), not physics (units)add training setraw datahidden structureBayes’ Rule in 1DPredict LinkagesBayes’ RuleCoexpressionEvaluate posterior at millions of pairsP(L=1|E) for L=?Optimal decision rule“Bayes error rate”= min. error rate of classifier∑=LLPLEPLPLEPELP)()|()()|()|(Bayes’ Rule: Calculateconditional probability oflinkage given evidence2D Network IntegrationAccount for statistical dependence2D Scatterplotcoexpression vs. coinheritance2D Network IntegrationEstimate densitiesKernel Density EstimationGray-Moore dual tree algorithm (digression #1)2D Network Integration2D Network IntegrationPosterior probability of interactionP(L=1|E)visual, geometric interpretation€ P(L =1 | E) = .9€ P(L =1 | E) = .5€ P(L =1| E) = .12D Network Integration Hacks revisitedIntersectionUnionAverageAll are suboptimal…including decision trees, naïve bayes, etc.Hidden BiologyDividend of Network IntegrationJoint density reveals hidden biologyModerate evidence from multiple sources!Subtle interactions missed by univariate methods…Recap #2Just shownTraining set: scale to common axesScatterplot + KDEPosterior probability of interactionHidden biologyTo showgeneralizationsN evidences, arbitrary microbes…Using N predictorsExample with N = 3 (coinheritance, colocation, coexpression)note evidence couplinghigh colocation compensates for low coexpressionnonlinear reln. revealed by joint density…)1|,,(321=LEEEP)0|,,(321=LEEEP),,|1(321EEELP =coexpression (E1)colocation (E2)coinheritance (E3)Binary Classifier ParadigmPair w/ unknown linkage statusgiven interaction
View Full Document