Networks of Protein Interactions Network AlignmentRecapMotivationSlide 4Network AlignmentEarlier approaches: interologsEarlier approaches: PathBLASTSlide 8Earlier approaches: MaWIShSlide 10A General Network Aligner: GoalsSlide 12A General Network Aligner: ModelSlide 14A General Network Aligner: ScoringSlide 16Slide 17Slide 18ESMs: A New Edge-Scoring ParadigmSlide 21Slide 22A General Network Aligner: AlgorithmSlide 24d-Clusters: Intuitiond-ClustersSlide 27Slide 28Extending seedsSlide 30Multiple AlignmentResulting AlignmentsPowerPoint PresentationSlide 34Slide 35Comparison to Extant MethodsPairwise Full NetworkPairwise Query-to-DatabaseMultiple Alignment (3-way)Networks of Protein InteractionsNetwork AlignmentAntal NovakCS 374Lecture 610/13/2005Nuke: Scalable and General Pairwise and Multiple Network AlignmentFlannick, Novak, Srinivasan, McAdams, Batzoglou (2005)RecapNetwork IntegrationCombine data from multiple sources to obtain robust probabilities of interactionCan be performed in a high-throughput manner“Whatcha gonna do with it?”Network alignment!Sequence alignment seeks to identify conserved DNA or protein sequenceIntuition: conservation implies functionalityEFTPPVQAAYQKVVAGV (human)DFNPNVQAAFQKVVAGV (pig)EFTPPVQAAYQKVVAGV (rabbit)MotivationBy similar intuition, subnetworks conserved across species are likely functional modulesMotivationNetwork Alignment“Conserved” means two subgraphs contain proteins serving similar functions, having similar interaction profilesKey word is similar, not identicalmismatch/substitutionEarlier approaches: interologsInteractions conserved in orthologsOrthology is a fuzzy notionSequence similarity not necessary for conservation of functionGoal: identify conserved pathways (chains)Idea: can be done efficiently by dynamic programming if networks are DAGsKelley et al (2003)DD’+ matchEarlier approaches: PathBLASTCX’+ mismatchB+ gapAA’Score: matchProblem: Networks are neither acyclic nor directedSolution: eliminate cycles by imposing random ordering on nodes, perform DP; repeat many timesIn expectation, finds conserved paths of length L within networks of size n in O(L!n) timeDrawbacksComputationally expensiveRestricts search to specific topologyKelley et al (2003)Earlier approaches: PathBLAST1 4 2352 1 4535 2 134Goal: identify conserved multi-protein complexes (clique-like structures)Idea: such structures will likely contain at least one hub (high-degree node)Koyuturk et al (2004)Earlier approaches: MaWIShAlgorithm: start by aligning a pair of homologous hubs, extend greedilyKoyuturk et al (2004)Efficient running time, but also only solves a specific caseEfficient running time, but also only solves a specific caseEarlier approaches: MaWIShA General Network Aligner: GoalsSolve restrictions of existing approachesShould extend gracefully to multiple alignment•PathBLAST was extended to 3-way alignment, but extension scales exponentially in number of speciesShould not restrict search to specific network topologies (cliques/pathways)Must be efficient in running timeA General Network Aligner: GoalsUseful application for biologists: given a candidate module, align to a database of networks (“query-to-database”)Query: Database:Earlier approaches aligned pairs of nodesInstead, alignment as an equivalence relation: equivalence class consists of proteins evolved from a common ancestral proteinCan contain multiple proteins in same species (paralogs)Handles multiple alignment in an obvious way{paralogA General Network Aligner: ModelExample:hypotheticalancestralmoduledescendantsequivalenceclassesA General Network Aligner: Model€ S = SN+ SE= 11.0 + 4.0Probabilistic scoring of alignments:M : Alignment model (network evolved from a common ancestor)R : Random model (nodes and edges picked at random)Nodes and edges scored independently€ logP(nodes | M)P(nodes | R)+ logP(edges | M)P(edges | R)2.54.0 1.53.00.80.4-0.40.81.2-0.30.60.50.6-0.2A General Network Aligner: ScoringNode scores: simpleWeighted Sum-Of-Pairs (SOP)•Each equivalence class scored as sum (over pairs ni, nj) of , where is weight on phylogenetic tree€ wijlog P(ni,nj)€ wijH. pyloriM. tuberculosis C. crescentus2 31E. coli4€ w12=w13=w14=0.50.250.25 w23= w24= w34=0.250.250.5A General Network Aligner: ScoringAlignment model•Based on BLAST pairwise sequence alignment scores Sij•Intuition: most proteins descended from common ancestor have sequence similarity• Random model•Nodes picked at random• € PM(ni,nj) = P(BLAST score Sij| ni,nj homologous)€ PR(ni,nj) = P(BLAST score Sij)A General Network Aligner: ScoringEdge scores: more complicatedEdge scores in earlier aligners rewarded high edge weights•But this biases towards clique-like topology!Don’t want solely conservation either•This alignment has highly conserved (zero-weight) edges:Non-trivial tradeoff in pairwise alignment of full networksNon-trivial tradeoff in pairwise alignment of full networksA General Network Aligner: ScoringIdea: assign each node a label from a finite alphabet ∑, and define edge likelihood in terms of labels it connectsDuring alignment, assign labels which maximize scoreE: Symmetric matrix of probability distributions, E(x, y) is distribution of edge weights between nodes labeled x and yESMs: A New Edge-Scoring ParadigmFor query-to-database alignment, use a module ESMOne label for each node in query module•Tractable because queries are usually small (~10-40 nodes)For each pair of nodes (ni, nj) in query, let E(i, j) be a Gaussian centered at cij = weight of (ni, nj) edgeESMs: A New Edge-Scoring ParadigmMultiple alignment gives us more information about conservationCan iteratively improve ESM to adjust mean and deviation based on weights of edges between aligned pairs of query nodes•Easily implemented using kernel density estimation (KDE)ESMs: A New Edge-Scoring ParadigmGiven this model of network alignment and scoring framework, how to efficiently find alignments between a pair of networks (N1, N2)?Constructing every possible set of equivalence classes clearly prohibitiveA General Network Aligner: AlgorithmIdea: seeded alignmentInspired by seeded sequence alignment (BLAST)Identify regions of network in which “good” alignments likely to be found•MaWISh does this, using high-degree
View Full Document