Stanford CS 374 - Introduction and Integration - D2460114

Home> Schools> Stanford University> Computer Science (CS) > CS 374> Introduction and Integration

DOC PREVIEW

Stanford CS 374 - Introduction and Integration

School name Stanford University

Course Cs 374- Algorithms in Biology

Pages 39

This preview shows page 1-2-3-18-19-37-38-39 out of 39 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 39 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Networks of Protein Interactions Introduction and IntegrationOverviewCoexpressionCoinheritanceColocationCoevolutionFunctional GenomicsIntegration MotivationHow to use 2 predictors?Early Integration HacksSlide 11Slide 12Recent workSlide 14Slide 15RecapTraining SetSlide 18Training SetsSlide 20Slide 21Bayes’ Rule in 1D2D Network IntegrationSlide 24Slide 25Slide 262D Network IntegrationHidden BiologyRecap #2Using N predictorsBinary Classifier ParadigmBlessing of DimensionalityClassifier builds networkMreBCtrA and CcrMC. jejuni glycosylationNets speed experimentSlide 38Further DirectionsNetworks of Protein InteractionsIntroduction and IntegrationBalaji S. SrinivasanCS 374Lecture 5 10/11/2005OverviewGenomics1 genomeAssembly, Gene FindingComparative GenomicsN genomesSequence AlignmentFunctional Genomics1 assayMicroarray AnalysisIntegrative GenomicsN assaysNetwork Integration (this talk)Coexpression1.811-.6-.7Gene AGene BGene CGene BGene AGene CPearson Correlation=.8-.7-.6ExpressionGenesArraysMicroarray dataCoinheritance1.9511-.95-1Protein AProtein BProtein CProtein BProtein AProtein C.95-1-.95=Spearman Correlation600200300100500100300200400250 250 50Protein AProtein BProtein CSpecies 2Species 1Species 4Species 3InheritanceBLAST bit scoresColocation0.0600.25.25Protein AProtein BProtein CProtein BProtein AProtein CAverage chromosomal distance.06.25.25=.6.2.3.1.5.1.3.2.4.25 .25 .05Protein AProtein BProtein CChrom 2Chrom 1Chrom 4Chrom 3LocationAssembled GenomesCoevolution1.911-.7-.8Prt Fam APrt Fam BPrt Fam CPrt Fam BPrt Fam APrt Fam CTree Distances.9-.8-.7=C’’EvolutionA A’ A’’ A’’’B’ B’’ B’’’BC’ C’’’CMultiple AlignmentsFunctional GenomicsMany others…ExperimentalTAP + Mass SpecY2HPheno & antibody arraysSynthetical lethalRNAi knockdownComputationalRosetta Stone (conserved domain)Shared OperonPSIMAPExperimentalComputationalIntegration MotivationCan we combine data?Example: Caulobacter crescentus flagellar proteinsCoexpression clusterCompare to coinheritancePotential for integration…CoexpressionCoinheritanceHow to use 2 predictors?Agree & disagree…Scales, noise levels, sources, very differentCan we do network integration ?coinheritancecoexpression≠Early Integration HacksGiven 2 netsintersectionunionaverage weights+€ G1= (V1, E1)coexpression€ V1,V2∈ (V, set of all proteins)coinheritance€ G2= (V2, E2)= € Eisc=1 if (E1> T1) || (E2> T2)Eunion=1 if (E1> T1) & & (E2> T2)Eavg= .5(E1+E2)Early Integration Hacks.9.8.7.6Coexpression.5.7.8.9Coinheritance+=IntersectionToo strict Too lenient Too dumb :)Union.65.35.45.75Average.35.4Early Integration HacksUseful dumb…All data equal?No explicit, statistical formulationdiff noise levelsdiff intervalsUninformed by prior data….65.35.85.75Average.35Too dumbRecent workBayesian Networks (Troyanskaya 2003)Decision Trees (Wong 2004)Naïve Bayes + Boosting (Lu 2005)Likelihood Ratios (Lee 2004)Recent workMajor innovation: Training SetMIPS, “Gold Standard” (Gerstein) SSL, synthetic lethals (Wong)DIP (Marcotte)Defines the signalWhat is our algorithm learning?KEGG (Pyrimidine Metabolism)Recent workMajor limitationsMethod specificDecision treesbinary codingBayesian Networksneed to poll people for priorAll methods Biological: limited to yeastStatisticaldependency hacks!Lee: heuristic weightingNaïve BayesNaïve Bayes (Lu 2005)Heuristic Weighting (Lee 2004)RecapJust shownFunctional GenomicsIntegration ProblemPrevious workall in S. cerevisiaemajor innovation: training setmajor shortcoming: dependence hacksTo cometraining set, common scalerigorous statistical dependencemicrobes only (for now…)+ + + …coexpression coinheritance colocation…Training SetObservationKnown linkages for nontrivial fraction of pairsCaulobacter crescentusKEGG: 783 of 3737 proteins in 1 or more KEGG pathwaysEx: pyrimidine metabolism, pathway 240Training SetTabulate pairs1 if shared COG/KEGG/GO0 if unshared? If one or both unknownMost pairs totally unknown…Training SetsMost pairs totally unknown…Caulobacter crescentus3737 proteins, 783 KEGGsmall in relative termslarge in absolute terms6667480 pairs6980716 pairs € =3737C2All pairs: L=0,1,?298961 pairs+14275 pairs+043.237372783=CCrelative frequency: training pairs vs. all pairsTraining Sets6667480 pairs298961 pairs14275 pairsAll pairs: L=0,1,?6980716 pairsTraining SetsTraining data is crucialReveals hidden structureSmall effort yields large payoffL=0,1,? statsPuts data on common scalemeter in biology (predictive power), not physics (units)add training setraw datahidden structureBayes’ Rule in 1DPredict LinkagesBayes’ RuleCoexpressionEvaluate posterior at millions of pairsP(L=1|E) for L=?Optimal decision rule“Bayes error rate”= min. error rate of classifier∑=LLPLEPLPLEPELP)()|()()|()|(Bayes’ Rule: Calculateconditional probability oflinkage given evidence2D Network IntegrationAccount for statistical dependence2D Scatterplotcoexpression vs. coinheritance2D Network IntegrationEstimate densitiesKernel Density EstimationGray-Moore dual tree algorithm (digression #1)2D Network Integration2D Network IntegrationPosterior probability of interactionP(L=1|E)visual, geometric interpretation€ P(L =1 | E) = .9€ P(L =1 | E) = .5€ P(L =1| E) = .12D Network Integration Hacks revisitedIntersectionUnionAverageAll are suboptimal…including decision trees, naïve bayes, etc.Hidden BiologyDividend of Network IntegrationJoint density reveals hidden biologyModerate evidence from multiple sources!Subtle interactions missed by univariate methods…Recap #2Just shownTraining set: scale to common axesScatterplot + KDEPosterior probability of interactionHidden biologyTo showgeneralizationsN evidences, arbitrary microbes…Using N predictorsExample with N = 3 (coinheritance, colocation, coexpression)note evidence couplinghigh colocation compensates for low coexpressionnonlinear reln. revealed by joint density…)1|,,(321=LEEEP)0|,,(321=LEEEP),,|1(321EEELP =coexpression (E1)colocation (E2)coinheritance (E3)Binary Classifier ParadigmPair w/ unknown linkage statusgiven interaction

View Full Document